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Title: 

Perfect score: 
Sequence : 

Scoring table: 



December 4, 2005, 09:57:24 ; Search time 187 Seconds 

(without alignments ) 
1184.208 Million cell updates/sec 

US-09-771-312-2 
2694 

1 MEELVHDLVSALEESSEQAR GFPLPKSTSATTTPNAGKSA 504 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



2443163 



Searched: 2443163 seqs, 439378781 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : A_Geneseq_21 : * 

1: geneseqpl980s : * 

2: geneseqpl990s : * 

3: geneseqp2000s : * 

4: geneseqp2001s : * 

5: geneseqp2002s : * 

6: geneseqp2003as : * 

7: geneseqp2003bs : * 

8 : geneseqp2004s : * 

9: geneseqp2005s : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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RESULT 1 
AAU06524 

ID AAU06524 standard; protein; 504 AA. 
XX 

AC AAU0 6524; 
XX 

DT 24-OCT-2001 (first entry) 
XX 

DE Prostate and testis-related gene 84P2A9 encoded protein. 
XX 

KW 84P2A9-related protein; prostate; testis; tissue; cancer; leukaemia; 

KW tumour; kidney; brain; bone; skin; ovary; breast; pancreas; colon; lung; 

KW cytostatic; gene therapy; antibody therapy; ribozyme; serum; blood; 

KW single chain monoclonal antibody; urine. 

XX 



OS Homo sapiens. 
XX 

PN WO200155391-A2. 
XX 

PD 02-AUG-2001. 

XX 

PF 26-JAN-2001; 2 001WO-US 002 65 1 . 
XX 

PR 26-JAN-2000; 2000US-017 8560P . 
XX 

PA (UROG-) UROGENESYS INC. 
XX 

PI Jakobovits A, Afar DEH, Challita-Eid PM, Levin E, Mitchell SC; 

PI Hubert RS; 

XX 

DR WPI; 2001-502631/55. 

DR N-PSDB; AAS11663. 
XX 

PT New 8 4P2A9 gene and its encoded protein, useful for diagnosing and 

PT treating cancer, e.g. leukemia and cancer of the prostate, testis, 

PT kidney, brain or bone, or for eliciting an immune response. 
XX 

PS Claim 13; Fig 2; 149pp; English. 

XX 

CC The polypeptide sequences represent the 84P2A9-related protein and 

CC peptide fragments of the protein. 84P2A9 exhibits prostate and testis 

CC specific expression in normal adult tissue, but it is also aberrantly 

CC expressed in many cancers including leukaemia and tumours of the 

CC prostate, testis, kidney, brain, bone, skin, ovary, breast, pancreas, 

CC colon and lung. The 84P2A9 polynucleotide, its related protein and 

CC peptide fragments and specific PCR primers are therefore useful for 

CC diagnosing and treating cancer. A vector comprising a polynucleotide 

CC which encodes a single chain monoclonal antibody, that immunospecif ically 

CC binds to an 84P2A9-related protein, and a ribozyme capable of cleaving a 

CC polynucleotide having the 84P2A9 coding sequence, are both useful in the 

CC preparation of a composition for treating a patient with a cancer that 

CC expresses 84P2A9. The sequences can be used in diagnostic methods to 

CC monitor the level of 84P2A9 gene products in serum, blood, urine and 

CC tissue and to thereby detect the presence of cancerous cells 

XX 

SQ Sequence 504 AA; 

Query Match 100.0%; Score 2694; DB 4; Length 504; 
Best Local Similarity 100.0%; Pred. No. 3.9e-228; 

Matches 504; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I 1! I I I I I I I I I I I I I I ! I I I I I I I I I I I I II I I 

Db 1 MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 60 

Qy 61 GHCLSEGSDS SLEEP SKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 120 

I I I I I 1 II I I I I 1 I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I II I I I I I I 

Db 61 GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 120 

Qy 121 ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 180 

I I I I I I I I I II I I M I I I I I I I I I I I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I 

Db 121 ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 180 



Qy 181 TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I 
Db 181 TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 240 

Qy 241 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 300 

I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 241 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 300 

Qy 301 ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
Db 301 ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSKVPI PGPVGNKRMVHFSPD 360 

Qy 361 SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 420 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I 
Db 361 SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 420 

Qy 421 KAAPLPGPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPK 480 

I II II I M I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 421 KAAPLPGPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPK 4 80 

Qy 4 81 GLGLGFPLPKSTSATTTPNAGKSA 504 

I I I II I M I I I I I I t I I II II I II 
Db 481 GLGLGFPLPKSTSATTTPNAGKSA 504 
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AAB92632 




ID 


AAB92632 standard; protein; 528 AA. 


XX 






AC 


AAB92632; 




XX 






DT 


26-JUN-2001 


(first entry) 


XX 






DE 


Human protein 


sequence SEQ ID NO: 10938. 


XX 






KW 


Human; primer 


; detection; diagnosis; antisense therapy; gene therapy 


XX 






OS 


Homo sapiens . 




XX 






PN 


EP1074617-A2. 




XX 






PD 


07-FEB-2001. 




XX 






PF 


28-JUL-2000; 


2000EP-00116126. 


XX 






PR 


29-JUL-1999; 


99JP-00248036. 


PR 


27-AUG-1999; 


99JP-00300253. 


PR 


ll-JAN-2000; 


2000JP-00118776. 


PR 


02-MAY-2000; 


2000JP-00183767. 


PR 


09-JUN-2000; 


2000JP-00241899. 


XX 






PA 


(HELI-) HELIX 


RES INST. 


XX 






PI 


Ota T, Isogai T, Nishikawa T, Hayashi K, Saito K, Yamamoto J; 


PI 


Ishii S, Sugiyama T, Wakamatsu A, Nagai K, Otsuki T; 


XX 







DR WPI; 2001-318749/34. 
XX 

PT Primer sets for synthesizing polynucleotides, particularly the 5602 full- 

PT length cDNAs defined in the specification, and for the detection and/or 

PT diagnosis of the abnormality of the proteins encoded by the full-length 

PT cDNAs . 
XX 

PS Claim 8; SEQ ID NO 10938; 2537pp + Sequence Listing; English. 
XX 

CC The present invention describes primer sets for synthesising 5602 full- 

CC length cDNAs defined in the specification. Where a primer set comprises: 

CC (a) an oligo-dT primer and an oligonucleotide complementary to the 

CC complementary strand of a polynucleotide which comprises one of the 5602 

CC nucleotide sequences defined in the specification, where the 

CC oligonucleotide comprises at least 15 nucleotides; or (b) a combination 

CC of an oligonucleotide comprising a sequence complementary to the 

CC complementary strand of a polynucleotide which comprises a 5 '-end 

CC sequence and an oligonucleotide comprising a sequence complementary to a 

CC polynucleotide which comprises a 3 1 -end sequence, where the 

CC oligonucleotide comprises at least 15 nucleotides and the combination of 

CC the 5 ! -end sequence/3 ' -end sequence is selected from those defined in the 

CC specification. The primer sets can be used in antisense therapy and in 

CC gene therapy. The primers are useful for synthesising polynucleotides, 

CC particularly full-length cDNAs . The primers are also useful for the 

CC detection and/or diagnosis of the abnormality of the proteins encoded by 

CC the full-length cDNAs . The primers allow obtaining of the full-length 

CC cDNAs easily without any specialised methods. AAH03166 to AAH13628 and 

CC AAH13633 to AAH18742 represent human cDNA sequences; AAB92446 to AAB95893 

CC represent human amino acid sequences; and AAH13629 to AAH13632 represent 

CC oligonucleotides, all of which are used in the exemplification of the 

CC present invention 

XX 

SQ Sequence 528 AA; 

Query Match 100.0%; Score 2694; DB 4; Length 528; 

Best Local Similarity 100.0%; Pred. No. 4.2e-228; 

Matches 504; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 60 

I I II I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I 
Db 25 MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 84 

Qy 61 GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 85 GHCLSEGSDS SLEEP SKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 14 4 

Qy 121 ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 145 ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 204 

Qy 181 TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 240 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I 
Db 205 TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 264 

Qy 241 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I M I I II 1 I I I I I I I 
Db 265 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 324 



Qy 301 ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 360 

I I I I I I I I I I I I I I I I I I I I I I I I 1 I I II I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 325 ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 384 

Qy 361 SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 420 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 385 SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 444 

Qy 421 KAAPLPGPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPK 480 

I I I I I I I I I I I I I I I I I I I I i I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 445 KAAPLPGPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPK 504 

Qy 481 GLGLGFPLPKSTSATTTPNAGKSA 504 

I I I I I I I I I M I I I I I I I I I I I I I 
Db 505 GLGLGFPLPKSTSATTTPNAGKSA 52 8 



RESULT 3 
ABB97288 

ID ABB97288 standard; protein; 528 AA. 
XX 

AC ABB97288; 
XX 

DT 28-JUN-2002 (first entry) 
XX 

DE Novel human protein SEQ ID NO: 556. 

XX 

KW Human; antianaemic; vulnerary; antiinflammatory; immunomodulator ; 

KW antiinf ertility ; cerebroprotective ; cytostatic; rheumatic; gene therapy; 

KW neuroprotective; antiparkinsonian; protein therapy; EST; 

KW expressed sequence tag. 

XX 

OS Homo sapiens. 
XX 

PN WO200222660-A2. 
XX 

PD 21-MAR-2002 . 
XX 

PF 10-SEP-2001; 2001WO-US026015 . 
XX 

PR ll-SEP-2000; 2 000US-00 65 9 67 1 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Tang YT, Liu C, Zhou P, Asundi V, Zhang J, Zhao QA, Ren F; 

PI Xue AJ, Yang Y, Wehrman T, Drmanac RT; 

XX 

DR WPI; 2002-292408/33. 

DR N-PSDB; ABN32474. 
XX 

PT An isolated polynucleotide for treating diseases associated with its 

PT encoded polypeptide such as cancer and multiple sclerosis. 

XX 

PS Example 2; SEQ ID NO 556; 509pp; English. 
XX 

CC The present invention provides the protein and coding sequences of 444 



CC novel human proteins. These were isolated from expressed sequences tags 

CC (ESTs). They can be used to stimulate cell growth, to regulate 

CC haematopoiesis e.g. to treat aplastic anaemia, to help tissue regrowth 

CC e.g. in burn treatment, to regulate the immune system e.g. to treat 

CC multiple sclerosis, to regulate activin or inhibin e.g. to treat 

CC infertility, to regulate haemostasis or thrombolysis e.g. to treat stroke 

CC and cancer, to screen for drugs, to treat inflammatory conditions e.g. 

CC rheumatoid arthritis, and to treat nervous system disorders e.g. 

CC Parkinson's disease. The present sequence is a protein of the invention 

XX 

SQ Sequence 52 8 AA; 

Query Match 100.0%; Score 2694; DB 5; Length 528; 

Best Local Similarity 100.0%; Pred. No. 4.2e-228; 

Matches 504; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 60 

I I I i I I I I I I I I I I I i I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I M I I 
Db 25 MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 84 

Qy 61 GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 12 0 

I I I I I I I I I II I I I I I I I I II I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 85 GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 144 

Qy 121 ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I M I I II 
Db 145 ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 204 

Qy 181 TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 240 

I I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I II I I M I I I I I I I I I I I I I I II I I I I I 
Db 205 TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 2 64 

Qy 241 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 300 

I II I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I II I I I M 
Db 265 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 324 

Qy 301 ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 360 

I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I 
Db 325 ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 384 

Qy 361 SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTAS RQTSMHLGSLCTGDIKRRR 42 0 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I II I I I I I I 
Db 385 SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 444 

Qy 421 KAAP L P G PTTAGFVGENAQ P I LENN I GNRMLQNMGWT P GS GLGRDGKGI S EP I QAMQRP K 480 

I I 1 I I II I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 445 KAAP LPG PTTAGFVGENAQ P I LENN I GNRMLQNMGWT PGS GLGRDGKGI S EP I QAMQRPK 504 

Qy 481 GLGLGFPLPKSTSATTTPNAGKSA 504 

I I I I I I I I I I I I I I I II I I I I I I I 
Db 505 GLGLGFPLPKSTSATTTPNAGKSA 528 



RESULT 4 
ADR99239 

ID ADR99239 standard; protein; 376 AA. 

XX 



AC ADR99239; 
XX 

DT 02-DEC-2004 (first entry) 
XX 

DE Hypothetical protein FLJ10252, SEQ ID 245. 
XX 

KW Cytostatic; breast cancer; cancer; human; FLJ10252. 

XX 

OS Homo sapiens. 
XX 

PN WO2004078035-A2. 
XX 

PD 16-SEP-2004 . 
XX 

PF 27-FEB-2004; 2004WO-US007268 . 
XX 

PR 28-FEB-2003; 2003US-0450655P . 
XX 

PA (FARB ) BAYER PHARM CORP. 
XX 

PI Eveleigh D, Bigwood D; 
XX 

DR WPI; 2 004-653556/63 . 

DR N-PSDB; ADR99112. 
XX 

PT Diagnosing breast cancer comprises comparing the level of expression of 

PT genes or gene products in a first biological sample taken from a patient 

PT with that in a normal patient sample. 
XX 

PS Claim 3; SEQ ID NO 245; 53pp; English. 

XX 

CC The present invention relates to a method (Ml) for diagnosing breast 

CC cancer in a patient. The method comprises comparing the level of 

CC expression of one or more genes or gene products in a biological sample 

CC from the patient with that in a normal patient sample, where a difference 

CC in the gene expression in the first sample compared to that in the second 

CC sample is a diagnostic of the disease. Also claimed are: method (M2) for 

CC distinguishing between normal and disease tissues; method (M3) for 

CC monitoring the response of a breast cancer patient to treatment with an 

CC anti-cancer agent; method (M4) for identifying a compound for treating 

CC breast cancer; and an array for distinguishing between normal and disease 

CC tissues comprising two or more probes corresponding to genes selected 

CC from ADR98995-ADR9912 1 or comprising two or more polypeptides selected 

CC from ADR99122-ADR99248 . In Ml and M2 the genes are selected from ADR98995 

CC -ADR99121 and the gene products are polypeptides selected from ADR99122- 

CC ADR99248. Ml is useful for diagnosing breast cancer. M2 and the array are 

CC useful for distinguishing between normal and disease tissue. M3 is useful 

CC for monitoring the response of a breast cancer patient to treatment with 

CC an anti-cancer agent. M4 is useful for identifying a compound for 

CC treating breast cancer. Note: The sequence data for this patent did not 

CC form part of the printed specification, but was obtained in electronic 

CC format directly from WIPO at ftp.wipo.int/pub/published_pct_sequences. 
XX 

SQ Sequence 376 AA; 



Query Match 67.1%; Score 1808; DB 8; Length 376; 

Best Local Similarity 99.7%; Pred. No. 3.1e-150; 



Matches 341; Conservative 1; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 60 

II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 25 MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 84 

Qy 61 GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 120 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 85 GHCLSEGSDS SLEEP SKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 144 

Qy 121 ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 180 

I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 145 ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 204 

Qy 181 TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 240 

I I I I I I I I I I I I I I I I I I II : I I II I I I I I I I I I I I I I I I I I M I I I I I I I II I I I I I II 
Db 205 TKNKVKKRKLKIIRQGPKIQNEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 264 

Qy 241 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 300 

I I I I I I I I I I I I II I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 2 65 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 324 

Qy 301 ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSM 342 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II 
Db 325 ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSM 366 



RESULT 5 
ABG08002 

ID ABG08002 standard; protein; 313 AA. 
XX 

AC ABG08002; 
XX 

DT 13-FEB-2002 (first entry) 
XX 

DE Novel human diagnostic protein #7993. 
XX 

KW Human; chromosome mapping; gene mapping; gene therapy; forensic; 

KW food supplement; medical imaging; diagnostic; genetic disorder. 
XX 

OS Homo sapiens . 
XX 

PN WO200175067-A2 . 
XX 

PD ll-OCT-2001. 
XX 

PF 30-MAR-2001; 2001WO-US008631 . 
XX 

PR 31-MAR-2000; 2000US-00540217 . 

PR 23-AUG-2000; 2000US-00649167 . 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Drmanac RT, Liu C, Tang YT; 
XX 

DR WPI; 2001-639362/73. 

DR N-PSDB; AAS72189. 



PT New isolated polynucleotide and encoded polypeptides, useful in 

PT diagnostics, forensics, gene mapping, identification of mutations 

PT responsible for genetic disorders or other traits and to assess 

PT biodiversity. 
XX 

PS Claim 20; SEQ ID NO 38361; 103pp; English. 
XX 

CC The invention relates to isolated polynucleotide (I) and polypeptide (II) 

CC sequences . ( I ) is useful as hybridisation probes , polymerase chain 

CC reaction (PCR) primers, oligomers, and for chromosome and gene mapping, 

CC and in recombinant production of (II). The polynucleotides are also used 

CC in diagnostics as expressed sequence tags for identifying expressed 

CC genes . ( I ) is useful in gene therapy techniques to restore normal 

CC activity of (II) or to treat disease states involving (II). (II) is 

CC useful for generating antibodies against it, detecting or quantitating a 

CC polypeptide in tissue, as molecular weight markers and as a food 

CC supplement. (II) and its binding partners are useful in medical imaging 

CC of sites expressing (II) . (I) and (II) are useful for treating disorders 

CC involving aberrant protein expression or biological activity. The 

CC polypeptide and polynucleotide sequences have applications in 

CC diagnostics, forensics, gene mapping, identification of mutations 

CC responsible for genetic disorders or other traits to assess biodiversity 

CC and to produce other types of data and products dependent on DNA and 

CC amino acid sequences. ABG00010-ABG30377 represent novel human diagnostic 

CC amino acid sequences of the invention. Note: The sequence data for this 

CC patent did not appear in the printed specification, but was obtained in 

CC electronic format directly from WIPO at 

CC ftp . wipo . int/pub/published_pct_sequences 

XX 

SQ Sequence 313 AA; 

Query Match 30.6%; Score 825.5; DB 4; Length 313; 

Best Local Similarity 55.7%; Pred. No. 9.2e-64; 

Matches 181; Conservative 4; Mismatches 25; Indels 115; Gaps 5 
Qy 1 MEELVHDLVSALE-ESSEQARGGFAETGD-HSRSISCPLKRQARKRRGRKRRSYNVHHPW 58 



Db 



37 MEELVHDLVSALERELQSKPRGGFAEPGDPFSEVYPCPLKRPARKRRGRKRRFVXCASP- 95 



Qy 



59 ETGHCL SEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGK 115 



Db 



96 -VGGLVTAXSEGSDSS 110 



Qy 



116 RPLWHESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAY 175 



Db 



111 



110 



Qy 



17 6 QYQEFTKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESD 235 



Db 



i • i ii i i i i ii i i i i i i ii i i i i i I i i i i i t i i i i i i i i i i i i i i i i 

111 FRRTKSKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESD 166 



Qy 



23 6 SSSLSSTDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPD 2 95 



Db 



167 SSSLSSTDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPD 22 6 



Qy 



296 PVFESILTGSFPLMSHPSRRGFQAR 320 



Db 227 PVFESILTGSFPLMSHPSRRGFPTK 251 



GenCore version 5.1.6 
Copyright (c) 1993 - 2005 Compugen Ltd. 



OM protein - nucleic search, using f rame_plus_p2n model 



Run on: 



December 11, 2005, 17:36:59 



; Search time 7575 Seconds 
(without alignments) 
3782.059 Million cell updates/sec 



Title: 

Perfect score : 
Sequence : 



US-09-771-312-2 
2694 

1 ME E L VH DLVS AL E E S S EQAR 



GFPLPKSTSATTTPNAGKSA 504 



Scoring table : 



BLOSUM62 



Xgapop 10.0 
Ygapop 10.0 
Fgapop 6 . 0 
Delop 6.0 



Xgapext 
Ygapext 
Fgapext 
Delext 



0.5 
0.5 
7.0 
7.0 



Searched: 



5883141 seqs, 



28421725653 residues 



Total number of hits satisfying chosen parameters: 



11766282 



Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Command line parameters: 
-MODEL=frame+_p2n. model -DEV=xlh 

-Q=/ cgn2_l/USPTO_spool/US 0977 1312 /runat_01 122 005_14 531 l_15042/app_query . fas ta_ 
1. 647 

-DB=GenEmbl -QFMT-fastap -SUFFIX=rge -MINMATCH=0 . 1 -LOOPCL-0 -LOOPEXT=0 
-UNITS=bits -START=1 ~END=-1 -MATRIX=blosum62 -TRANS=human4 0 . cdi -LIST-45 
-DOCALIGN=200 -THR_SCORE=pct -THR_MAX=100 -THR_MIN=0 -ALIGN=15 -MODE=LOCAL 
-OUTFMT-pto -N0RM=ext -HEAPSIZE=500 -MINLEN=0 -MAXLEN=2 00 0000000 
-USER=US0 9771312_@CGN_l_l_4 939_@runat_01122 005_145311_15042 -NCPU=6 -ICPU=3 
-NO_MMAP -LARGEQUERY -NEG_SCORES=0 -WAIT -DSPBLOCK=100 -LONGLOG 

-DEV_TIMEOUT=120 -WARN JTIMEOUT==3 0 -THREADS=1 -XGAPOP=10 -XGAPEXT=0.5 -FGAPOP=6 
-FGAPEXT=7 -YGAPOP=10 -YGAPEXT=0.5 -DELOP=6 -DELEXT=7 

Database : GenEmbl : * 



1: 
2 : 
3: 
4 : 
5: 
6: 
7 : 
8: 
9: 
10 



gb_ba : * 
gb_in : * 
gb_env: * 
gb_om: * 
gb_ov: * 
gb_pat : * 
gb_ph : * 
gb_p r : * 
gb_ro : * 



gb_sts : * 



11: 


gb sy:* 


12: 


gb un:* 


13: 


gb vi : * 


14: 


gb htg:* 


15: 


gb pi : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 





No . 


Score 


Match 


Length 


DB 


ID 


Description 




1 


2694 


100 


0 


2338 


6 


BD155908 


BD155908 


Primer fo 




2 


2694 


100 


0 


2338 


6 


AX876032 


AX876032 


Sequence 




3 


2694 


100 


0 


2338 


' 8 


AK001114 


AK001114 


Homo sapi 




4 


2694 


100 


0 


2344 


6 


AX405697 


AX405697 


Sequence 




5 


2694 


100 


0 


2345 


6 


AX206855 


AX206855 


Sequence 




6 


2318 


86 


0 


4537 


9 


BC054810 


BC054810 


Mus muscu 




7 


2014 . 5 


74 


8 


5187 


5 


AJ851518 


AJ851518 


Gallus ga 




8 


1813 


67 


3 


1026 


6 


CO720787 


CQ720787 


Sequence 




9 


1813 


67 


3 


4022 


6 


BD183390 


BD183390 


Novel gen 




10 


1808 


67 


1 


3189 


8 


BC042193 


BC042193 


Homo sapi 




11 


1807 


67 


1 


3250 


8 


BC063474 


BC063474 


Homo sapi 




12 


1587 . 5 


58 


9 


1350 


5 


BC097745 


BC097745 


Xenopus 1 




13 


1513.5 


56 


2 


1392 


9 


BC07 9? 3? 


BC079232 


Rattus no 




14 


1229 


45 


6 


135060 


8 


AL354 659 


AL354659 


Human DNA 




15 


1229 


45 


6 


142 908 


14 


AL513172 


AL513172 


Homo sapi 




16 


1072 . 5 


39 


8 


180315 


9 


AC107843 


AC107843 


Mus muscu 




17 


1072 . 5 


39 


8 


260404 


9 


AC 110033 


AC110033 


Mus muscu 




18 


1054 


39 


1 


817 


6 


BD14 6304 


BD146304 


Primer fo 




19 


1054 


39 


1 


817 




AX866242 


AX866242 


Sequence 




20 


1038 


38 


5 


254644 


14 


AC136836 


AC136836 


Rattus no 




21 


1038 


38 


5 


256511 


14 


AC135040 


AC13504C 


Rattus no 


c 


22 


1038 


38 


5 


262721 


14 


AC106265 


AC106265 


Rattus no 




23 


1008 


37 


4 


759 


8 


BC027719 


BC027719 


Homo sapi 




24 


627.5 


23 


3 


3947 


9 


BC048169 


BC048169 


Mus muscu 




25 


627.5 


23 


3 


3947 


9 


BC058256 


BC058256 


Mus muscu 




26 


627.5 


23 


3 


4311 


9 


AK129299 


AK129299 


Mus muscu 




27 


627.5 


23 


3 


4314 


9 


BC050782 


BC050782 


Mus muscu 




28 


608 


22 


6 


1021 


8 


AK024701 


AK024701 


Homo sapi 


c 


29 


566.5 


21 


0 


148801 


5 


BX004824 


BX004824 


Zebraf ish 


c 


30 


566. 5 


21 


0 


151096 


14 


BX005303 


BX005303 


Danio rer 




31 


563.5 


20 


9 


2434 


6 


AX405970 


70(405970 


Sequence 




32 


563.5 


20 


9 


2463 


8 


AK000696 


AK000696 


Homo sapi 




33 


544 


20 


2 


407 


6 


CQ735676 


CQ735676 


Sequence 




34 


534 


19 


8 


469 


6 


BD108636 


BD108636 


EST and e 




35 


534 


19 


8 


469 


6 


AR413083 


AR413083 


Sequence 




36 


534 


19 


8 


469 


6 


AX969917 


AX969917 


Sequence 




37 


530. 5 


19 


7 


1474 


6 


AX405879 


AX405879 


Sequence 




38 


530. 5 


19 


7 


1485 


8 


BC058032 


BC058032 


Homo sapi 




39 


530.5 


19 


7 


2112 


6 


BD158526 


BD158526 


Primer fo 




40 


530.5 


19 


7 


2112 


6 


AX880680 


AX880680 


Sequence 




41 


530.5 


19 


7 


2112 


8 


AK023523 


AK023523 


Homo sapi 



42 


530.5 


19. 


.7 


6256 


8 


43 


459.5 


17. 


. 1 


526 


5 


44 


422 


15. 


.7 


1490 


8 


45 


420 


15, 


, 6 


445 


6 



AB032978 
CT025336 
BC038835 
CQ431223 



AB032978 Homo sapi 
CT025336 Xenopus t 
BC038835 Homo sapi 
CQ431223 Sequence 



RESULT 1 
BD155908 
LOCUS 
2003 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
Yamamoto, J . , 

TITLE 
JOURNAL 



BD155908 



ALIGNMENTS 



2338 bp 



DNA 



linear 



PAT 17-JAN- 



Primer for synthesizing full-length cDNA and use thereof. 
BD155908 

BD155 908 . 1 GI : 278 61666 
JP 2002191363-A/10751. 
Homo s apiens ( human ) 
Homo sapiens 

Eukaryota ; Metazoa ; Chordata; Craniata ; Vertebrata ; Euteleostomi ; 
Mammalia ; Eutheria ; Euarchontoglires ; Primates ; Catarrhini ; 
Hominidae ; Homo . 
1 (bases 1 to 2338) 

Ota, T . , Isogai, T . , Nishikawa, T . , Hayashi, K . , Saito, K. , 

Ishii,S., Sugiyama, T . , Wakamatsu,A. , Nagai,K. and Otsuki,T. 
Primer for synthesizing full-length cDNA and use thereof 
Patent: JP 2002191363-A 10751 09-JUL-2002; 
HELIX RESEARCH INSTITUTE 



COMMENT 



WAKAMATSU, 



OS 


Homo sapiens (human) 




PN 


JP 2002191363-A/10751 




PD 


09-JUL-2002 




PF 


28-JUL-2000 JP 2000280990 




PI 


TOSHIO OTA, TAKAO ISOGAI , TETSUO 


NISHIKAWA, KOJI HAYASHI , KAORU 


PI 


SAITO, 




PI 


JUNICHI YAMAMOTO, SHIZUKO ISHII, 


TOMOYASU SUGIYAMA, AI 


PI 


KEIICHI NAGAI , TETSU JI OTSUKI 




PC 







FEATURES 

source 



ORIGIN 



C12N15/09,C07K14/47,C07K16/18,C12N1/15,C12N1/19,C12N1/21,C12N5/ PC 
10, 

PC C12P2 1/02, C12Q1/ 6 8//C12P2 1/08, G06F17/30, C12N15/00, C12N5/00 CC 
Primer for synthesizing full-length cDNA and use thereof FH Key 
Location/Qualifiers 
FT CDS (99) . . (1682) . 

Location/ Qualifiers 
1. .2338 

/ organism="Homo sapiens " 
/mol_type=" genomic DNA" 
/db xref="taxon: 9606" 



Alignment Scores: 

Pred. No. : 5.58e-138 

Score: 2694 . 00 

Percent Similarity: 100.00% 

Best Local Similarity: 100.00% 



Length: 2338 

Matches : 504 

Conservative: 0 

Mismatches: 0 



Query 


Match: 


100.00% Indels: 0 




DB: 




6 Gaps : 0 




US-09- 


-771-312- 


-2 (1-504) x BD155908 (1-2338) 




Qy 


1 


MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAl aArg 


20 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 I l l l t l l 




Db 


171 


AT G GAG GAG C T G G T T CAT G AC C T T G T C T C A G CAT T G G AAG A G AG C T C AG AG C AAG C T C G A 


230 


Qy 


21 


GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 


4 0 






1 1 1 1 1 1 1 1 1 1 I 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 It 1 1 1 1 1 1 i 1 1 1 i 1 1 1 1 ! 1 1 1 1 1 1 I I I I I I t 

> 1 > 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 j 1 1 1 




Db 


231 


GGTGGATTTGCTGAAACAGGAGACCATTCTCGAAGTATATCTTGCCCTCTGAAACGCCAG 


290 


Qy 


41 


AlaAraLvsAraAraGl vAraLvsAraAraSerTvrAsnValHisHi sProTrnni nThr 








1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

<> i ■ i ■ I I I I I I I I I I I I I I I I I i i i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 M 




Db 


291 


G CAAGGAAAAG GAGAG GGAGAAAAC G GAG GT CGT AT AAT GT G CAT CAC C C GT GG GAGACT 


350 


Qy 


61 


GlyHisCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArg 


80 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1! 1 II 1 1 1 1 1 1 1 1 1 I 1 1 I I I I I 
i i i i i * i i i i i I i i i i i i i i i i i i i i i i i i i i I I I I I I I f I I I I I I I I I I I I I I I I M | | 




Db 


351 


GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 


410 


Qy 


81 


GluAsnHisAsnAsnAsnLysLysAspHis SerAspSerAs pAspGlnMetLeuVa lAla 


100 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 l l l l l l l l I I I I i i i 
i > i i i i i i i i t i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i l l l l [ l l 1 1 1 1 




Db 


411 


GAGAAT CACAAT AAT AAT AAAAAAGAT CAC AGT GAC T C T GAT GAC C AAAT GT T AGT AG CA 


470 


Qy 


101 


LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 


120 






1 1 1 1 1 1 1 1 i II 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I I I I 
i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i 




Db 


471 


AAG C GC AG G C C GT CAT CAAACT T AAAT AAT AAT GT T C GAGG GAAAAGAC C T C TAT G GCAT 


530 


Qy 


121 


GluSerAspPheAlaValAsDAsnValGl vAsnAraThrLeuAraAraAraArrfT,vqVa 1 


X *4 \J 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 ! M 1 1 1 1 1 1 1 1 1 

1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I I I I 1 1 1 1 1 1 1 1 1 




Db 


531 


GAGTCTGATTTTGCTGTGGACAATGTTGGGAATAGAACTCTGCGCAGGAGGAGAAAGGTA 


590 


Qy 


141 


LvsAraMetAlaValAsDLeuPr oGlnAsDl leSe rAsnLvsAraTh rMpt~Th rfil n P m 


i fin 

X \J \J 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
i i * > i i i l l l l l l l l l l 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 




Db 


591 


AAACGCAT GGCAGTAGAT CT CCCACAGGACATCT CTAACAAAC GGACAAT GAC C CAGCCA 


650 


Qy 


161 


ProGluGlyCysAr gAspGlnAspMetAspSerAspArgAlaTyrGlnTyrGlnGluPhe 


X U \J 






1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 l l l l l l M I I I I I I i 
i i i i i i ii i i i i i i i i i i i i i i i i i i i i i i i i i i i i I I I I I I I I I I I I i I 1 1 1 1 1 1 1 1 1 1 




Db 


651 


C CT GAGGGTT GTAGAGAT CAGGACATGGACAGT GATAGAGC CTACCAGTAT CAAGAATTT 


710 


Qy 


181 


ThlLvsAsnLvsValLvsLvsAraLvsLeuLvsIlelleAraGl nGl vPrnT.vqTl e>f^1 n 


z. u u 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 I 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I II I I I | 




Db 


711 




f 1 u 


Qy 


201 


AspGluGlyValValLeuGluSerGluGluThrAsnGlnThrAsnLysAspLysMetGlu 


220 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 1 1 1 1 1 1 I 1 I I I I I || I I | | | | | | | | | | | 




Db 


771 


GAT G AAGGAGT AGT T T T AG AAAGT GAG GAAAC GAAC C AGAC C AAT AAG G AC AAAAT G GAA 


830 


Qy 


221 


CysGluGluGlnLysValSerAspGluLeuMetSerGluSerAspSerSerSerLeuSer 


240 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1! 1 1 1 1 1 1 1 




Db 


831 


T GT GAAGAGCAAAAAGT C T CAGAT GAG CT CAT GAGT GAAAGT GAT T C CAG CAGT C T CAGC 


890 



Qy 

Db 



241 SerThrAspAlaGlyLeuPheThrAsnAspGluGlyArgGlnGlyAspAspGluGlnSer 2 60 

I M II I I I I I I I I I I I I II I I I I I I I I I I I I I I || | | | | || | | | || | M I I I II I I I I I I 
891 AG CAC T GAT G C T G GAT T GT T T AC C AAT GAT GAG G G AAGAC AAG GT GAT GAT GAAC AG AGT 950 



Qy 261 AspTrpPheTyrGluLysGluSerGlyGlyAlaCysGlylleThrGlyValValProTrp 280 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 951 GACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTTGTGCCCTGG 

1010 

Qy 281 TrpGluLysGluAspProThrGluLeuAspLysAsnValProAspProValPheGluSer 300 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 1011 T G G GAAAAGGAAGAT C CT ACT GAG CT AGACAAAAAT GTAC CAGAT CCTGTCTTT GAAAGT 

1070 

Qy 301 IleLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGlnAlaArg 320 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 1071 ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 

1130 

Qy 321 LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 340 

I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1131 CTCAGTCGCCTTCATGGAATGTCTTCAAAGAATATTAAAAAATCTGGAGGGACTCCAACT 

1190 

Qy 341 SerMetValProIleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I 
Db 1191 TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 

1250 

Qy 361 SerHisHisHisAspHisTrpPheSerProGlyAlaArgThrGluHisAspGlnHisGln 380 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 1251 T CT CAT C AC CAT GAC CAT T G GT T TAG C C CT GGG GCT AG GACAGAG CAT GAC C AG CAT CAG 

1310 

Qy 381 LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerValArgThrAla 4 00 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1311 CT T CT GAGAGAT AAT C GAG CT GAAAGAG GACACAAGAAAAAT T GT T CT GT GAGAACAGC C 

1370 

Qy 4 01 SerArgGlnThrSerMetHisLeuGlySerLeuCysThrGlyAspIleLysArgArgArg 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1371 AG C AGG CAAAC AAG CAT G CAT T TAG GAT C CT T AT G C AC G G GAG AT AT C AAAC G GAGAAGA 

1430 

Qy 421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 440 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1431 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 

1490 

Qy 441 IleLeuGluAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 4 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I M II I I I M I I I I I II I I I I I 
Db 14 91 AT C CT AGAAAATAAT ATT G GAAAC C GAAT GCTT CAGAATAT G G G CT GGAC G C CT GG GT C A 

1550 

Qy 4 61 GlyLeuGlyArgAspGlyLysGlylleSerGluProIleGlnAlaMetGlnArgProLys 4 80 

I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1551 G G C CT T G GAC G AGAT G GC AAG G G GAT CT CT GAG C C AAT T C AAG C CAT G C AGAG G C C AAAG 

1610 



Qy 481 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 500 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 1611 G GAT TAG GACT T G GAT T T C C T CT AC CAAAAAGTACT T C CG CAACT ACT AC C CC CAAT GC A 

1670 

Qy 501 GlyLysSerAla 504 

I I I I I I I I I I I I 
Db 1671 GGAAAATCCGCC 1682 

RESULT 2 
AX876032 

LOCUS AX876032 2338 bp DNA linear PAT 17-DEC- 

2003 

DEFINITION Sequence 10937 from Patent EP1074617. 
ACCESSION AX876032 

VERSION AX876032.1 GI:40030768 

KEYWORDS 

SOURCE Homo sapiens (human) 

ORGANISM Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 

Mammalia; Eutheria; Euarchontoglires ; Primates; Catarrhini; 

Hominidae; Homo. 
REFERENCE 1 

AUTHORS Ota,T., Isogai,T., Nishikawa, T . , Hayashi,K., Saito,K., 
Yamamoto, J . , 

Ishii,S., Sugiyama,T., Wakamatsu, A. , Nagai,K. and Otsuki,T. 
TITLE Primers for synthesising full-length cDNA and their use 

JOURNAL Patent: EP 1074617-A 10937 07-FEB-2001; 

Research Association for Biotechnology (JP) 
FEATURES Location/Qualifiers 
source 1. .2338 

/organism="Homo sapiens" 
/mol_type="unassigned DNA" 
/db_xref="taxon: 9606" 
CDS 99. .1685 

/note="unnarned protein product" 
/codon_start=l 
/protein_id="CAE89098 .1" 
/db_xref="GI: 40030769" 

/ translation="MFGAAGRQPIGAPAAGNSWHFSRTMEELVHDLVSALEESSEQAR 

GGFAETGDHSRSI SCPLKRQARKRRGRKRRSYNVHHPWETGHCLSEGSDSSLEEPSKD 

YRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWHESDFAVDNVGNRTLRR 

RRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEFTKNKVKKRKLKIIR 

QGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLSSTDAGLFTNDEG 

RQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFESILTGSFPLMS 

HPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPDSHHHDHWF 



SPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRRKAAPLP 



GPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPKGLGL 

GFPLPKSTSATTTPNAGKSA" 

ORIGIN 



Alignment Scores: 

Pred. No.: 5.58e-138 Length: 2338 

Score: 2694.00 Matches: 504 

Percent Similarity: 100.00% Conservative: 0 

Best Local Similarity: 100.00% Mismatches: 0 



Que r y 


Match : 


100.00% Indels: 0 




DB : 




6 Gaps : 0 




US-09- 


-771-312- 


-2 (1-504) x AX876032 (1-2338) 




Ov 


1 


MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAl aArg 


20 






1 1 1 1 1 1 1 II 1 1 M 1 1 1 1 M 1 1 1 1 1 1 1.1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 




Dfo 


171 


ATGGAGGAGCTGGTTCATGACCTTGTCTCAGCATTGGAAGAGAGCTCAGAGCAAGCTCGA 


230 


Ov 


21 


GlyGlyPheAlaGluThrGlyAspHis SerArgSerlleSerCys ProLeuLysArgGln 


40 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1! 




Db 


231 


G GT G GAT T T G CT GAAACAGGAGAC CAT T C T C GAAGT AT AT CTTGCCCTCT GAAAC G C CAG 


290 


Ov 


4 1 


Al aAraLvsAraAraGl vAraLvsAraAraSerT vrAsnValHisHis ProTroGluThr 


60 






i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


291 


GCAAGGAAAAGGAGAGGGAGAAAACGGAGGTCGTATAATGTGCATCACCCGTGGGAGACT 


350 




61 




80 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 




Db 


351 


GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 


410 




8 1 


f^l nA^nHi A <=;ri A^n A«?ri T.v 5 ? TiV 6 ? A^nHi qSprA^nSprA^nA^nfil riMpt" T.e^uVa 1 Al a 

\j _i_ Uiio i li i _L oriel itriij i iaj 1 1 j_i y o u y o i _ o ux j *j c __ _i.i_> uriij kj s_j_l hi lt^ l- j_i^_; v cl -i_.fi.-L. ci 


100 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


411 


GAGAAT CAC AAT AAT AAT AAAAAAGAT CAC AGT GACT C T GAT GAC C AAAT GTT AGTAGCA 


470 


^y 


i m 

X v 1 


T.uci Arrr A rrf Prn^pr^lpr A=;nT,pii A<=;ri A^n A^nVa 1 A >- rrf^ 1 vTa/^ A TrrP rnT 1 pnT tfiHi 

J_l V i— >j___ UAL y LT LUJ ~ i_ *J " J_ 11JJC LLTiO 1 LTVu l__T~Li_> 1 i V CI _L.ii.J_ \J v_J_l_ y J_J V O _!.__ \j j_ J_ V_J __J J. J_ L_/ 1 i _I_ o 


120 






i i i i i i t i i i i i i i t i i i i i i i i i i i i i 1 i i i i i ! i i i i 1 1 i 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 i 1 1 
1 1 1 1 1 1 M 1 1 1 1 1 M I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 ] 1 M 1 1 1 1 1 1 1 1 1! 1 1 1 1 1 1 1 




Db 


471 


AAG C G CAG G C C GT CAT CAAAC T TAAATAAT AAT GT T C GAG G GAAAAGAC CT CT AT GG CAT 


530 


Qy 


121 


GluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 


140 






1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 II 1 1 1 1 M 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 




Db 


531 


GAGTCTGATTTTGCTGTGGACAATGTTGGGAATAGAACTCTGCGCAGGAGGAGAAAGGTA 


590 


Qy 


141 


LysArgMetAlaValAspLeuProGlnAspIleSerAsnLysArgThrMetThrGlnPro 


160 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


591 


AAAC G CAT G GCAGT AGAT CT C C CACAG GAC AT C T CTAACAAAC GGACAAT GAC CCAG C CA 


650 


Qy 


161 


ProGluGlyCysArgAspGlnAspMetAspSerAspArgAlaTyrGlnTyrGlnGluPhe 


180 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 II 1 1 1 




Db 


651 


C CT GAG G GT T GT AGAGAT CAG GACAT G GACAGT GAT AGAGC CT AC CAGT AT C AAGAATT T 


710 


Qy 


181 


ThrLysAsnLysValLysLysArgLysLeuLysIlelleArgGlnGlyProLysIleGln 


200 






1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 




Db 


711 


AC CAAGAACAAAGT CAAAAAAAGAAAGT TGAAAAT AAT CAGACAAGGAC CAAAAAT C CAA 


770 



Qy 



201 AspGluGlyValValLeuGluSerGluGluThrAsnGlnThrAsnLysAspLysMetGlu 220 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 



Db 771 GAT GAAG GAGT AGTT T T AGAAAGT GAGGAAAC GAAC CAGAC CAAT AAG GACAAAAT G GAA 83 0 

Qy 221 CysGluGluGlnLysValSerAspGluLeuMetSerGluSerAspSerSerSerLeuSer 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 831 T GT GAAGAG CAAAAAGT CT CAGAT GAGCT CAT GAGT GAAAGT GAT T C C AG CAGT CT C AG C 890 

Qy 241 SerThrAspAlaGlyLeuPheThrAsnAspGluGlyArgGlnGlyAspAspGluGlnSer 260 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II 
Db 891 AGCACT GAT GCT GGATT GTTTACCAAT GAT GAGGGAAGACAAGGT GAT GAT GAACAGAGT 950 

Qy 261 AspTrpPheTyrGluLysGluSerGlyGlyAlaCysGlylleThrGlyValValProTrp 280 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 951 GACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTTGTGCCCTGG 

1010 

Qy 281 TrpGluLysGluAspProThrGluLeuAspLysAsnValProAspProValPheGluSer 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 1011 TGGGAAAAGGAAGAT C CTACT GAGCT AGACAAAAAT GTAC CAGAT C CT GT CTTT GAAAGT 

1070 

Qy 301 IleLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGlnAlaArg 320 

I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I M I I I II I I I I I I I II I I I I I II I I M I 
Db 1071 ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 

1130 

Qy 321 LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 340 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1131 CT CAGT CGCCTT CAT GGAATGTCTTCAAAGAATATTAAAAAATCTGGAGGGACTCCAACT 

1190 

Qy 341 SerMetValProIleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 1191 TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 

1250 

Qy 361 SerHisHisHisAspHisTrpPheSerProGlyAlaArgThrGluHisAspGlnHisGln 380 

I I I I I I I I I I I II I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I II I M I I I I I I II I I 
Db 1251 T C T CAT C AC CAT GAC CAT T G GT T TAG C C CT G G G G CT AG GACAGAG CAT GAC C AG CAT CAG 

1310 

Qy 381 LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerValArgThrAla 400 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1311 CTTCT GAGAGATAAT CGAGCT GAAAGAGGACACAAGAAAAAT T GTT CT GT GAGAACAGC C 

1370 

Qy 401 SerArgGlnThrSerMetHisLeuGlySerLeuCysThrGlyAspIleLysArgArgArg 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I 
Db 1371 AG CAG G CAAACAAG CAT G CAT T TAG GAT C CT TAT G CAC G G GAGAT AT C AAAC G GAGAAGA 

1430 

Qy 421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 440 

I I I I I I I I II I M I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I 
Db 14 31 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 

1490 

Qy 441 IleLeuGluAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 4 60 



Db 

1550 



I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II 
14 91 ATCCTAGAAAATAATATTGGAAACCGAATGCTTCAGAATATGGGCTGGACGCCTGGGTCA 



Qy 

Db 

1610 



4 61 GlyLeuGlyArgAspGlyLysGlylleSerGluProIleGlnAlaMetGlnArgProLys 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I II 
1551 GGCCTTGGACGAGATGGCAAGGGGATCTCTGAGCCAATTCAAGCCATGCAGAGGCCAAAG 



480 



Qy 

Db 

1670 



4 81 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 
I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I II 
1611 GGATTAGGACTTGGATTTCCTCTACCAAAAAGTACTTCCGCAACTACTACCCCCAATGCA 



500 



Qy 

Db 



501 GlyLysSerAla 504 

I I I I I I I I II I I 
1671 GGAAAAT C C GC C 1682 



RESULT 3 
AK001114 
LOCUS 
2004 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



Watanabe, S , 



Oshima,A. , 



AK001114 



2338 bp 



mRNA 



linear 



PRI 30-JAN- 



Homo sapiens cDNA FLJ10252 fis, clone HEMBB1000807 . 
AK001114 

AK001114.1 GI:7022173 

oligo capping; fis (full insert sequence) . 
Homo s api ens ( human ) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Euarchontoglires ; Primates; Catarrhini; 
Hominidae; Homo. 
1 

Ota,T., Suzuki, Y., Nishikawa, T . , Otsuki,T., Sugiyama,T., Irie,R., 
Wakamatsu, A. , Hayashi,K., Sato,H., Nagai,K., Kimura,K., Makita,H., 
Sekine,M., Obayashi,M., Nishi,T., Shibahara, T . , Tanaka,T., 
Ishii,S., Yamamoto, J. , Saito,K., Kawai,Y., Isono,Y., Nakamura,Y., 
Nagahari,K., Murakami, K., Yasuda,T., Iwayanagi, T . , Wagatsuma, M. , 
Shiratori , A. , Sudo,H., Hosoiri,T., Kaku,Y., Kodaira,H., Kondo,H., 
Sugawara,M., Takahashi, M. , Kanda,K., Yokoi,T., Furuya,T., 
Kikkawa,E., Omura,Y., Abe,K., Kamihara,K., Katsuta,N w Sato / K. / 
Tanikawa,M. f Yamazaki,M. f Ninomiya^K., Ishibashi, T . , Yamashita, H . , 
Murakawa, K. , Fujimori, K., Tanai,H., Kimata,M., Watanabe, M. , 
Hiraoka,S., Chiba,Y., Ishida,S., Ono,Y., Takiguchi , S . , 

r 

Yosida,M., Hotuta,T., Kusano,J., Kanehori,K., Takahashi-Fu j ii , A. , 
Hara,H., Tanase,T., Nomura, Y. , Togiya,S., Komai,F., Hara,R., 
Takeuchi,K., Arita,M., Imose,N., Musashino, K . , Yuuki,H., 

Sasaki, N., Aotsuka,S., Yoshikawa, Y . , Matsunawa, H . , Ichihara,T., 
Shiohata,N., Sano,S., Moriya,S., Momiyama,H., Satoh,N., Takami,S., 
Terashima, Y. , Suzuki, O. , Nakagawa,S., Senoh,A. , Mizoguchi , H . , 
Goto,Y., Shimizu,F., Wakebe,H., Hishigaki , H . , Watanabe, T., 
Sugiyama,A., Takemoto,M., Kawakami,B., Yamazaki,M., Watanabe, K. , 
Kumagai,A., Itakura,S., Fukuzumi,Y., Fujimori, Y., Komiyama,M., 
Tashiro,H., Tanigami,A. , Fujiwara,T., Ono,T., Yamada,K., Fujii,Y., 
Ozaki,K., Hirao,M., Ohmori,Y., Kawabata,A. , Hikiji,T., 



Kobatake, N . , 

Inagaki,H., Ikema,Y., Okamoto,S., Okitani,R., Kawakami , T . , 
Noguchi,S., Itoh,T., Shigeta,K., Senba,T., Matsumura, K . , 
Nakajima,Y., Mizuno,T., Morinaga,M., Sasaki, M. , Togashi,T., 
Oyama,M., Hata,H., Watanabe,M., Komatsu, T . , Mizushima-Sugano, J . , 
Satoh,T., Shirai,Y., Takahashi , Y . , Nakagawa,K., Okumura,K., 
Nagase,T., Nomura, N., Kikuchi,H., Masuho,Y., Yamashita, R. , 
Nakai,K., Yada,T., Nakamura,Y., Ohara,0., Isogai,T. and Sugano,S. 

TITLE Complete sequencing and characterization of 21,243 full-length 

human cDNAs 

JOURNAL Nat. Genet. 36 (1), 40-45 (2004) 
PUBMED 14702039 
REFERENCE 2 

AUTHORS Isogai,T., Ota,T., Hayashi,K., Sugiyama,T., Otsuki,T., Suzuki, Y., 
Nishikawa, T . , Nagai,K., Sugano,S., Shiratori , A. , Sudo,H., 
Wagatsuma,M. , Hosoiri,T., Kaku,Y., Kodaira,H., Kondo,H., 
Sugawara,M., Takahashi , M. , Chiba,Y., Ishida,S., Murakawa,K., 
Ono,Y., Takiguchi, S . , Watanabe,S., Kimura,K., Murakami, K., 
Ishii,S., Kawai,Y., Saito,K., Yamamoto,J., Wakamatsu, A. , 
Nakamura,Y., Nagahari,K., Masuho,Y., Ninomiya,K. and Iwayanagi,T. 
NEDO human cDNA sequencing project 
Unpublished 
3 (bases 1 to 2338) 
Isogai,T. and Otsuki,T. 
Direct Submission 

Submitted ( 16-FEB-2000 ) Takao Isogai, Helix Research Institute, 
Genomics Laboratory; 1532-3 Yana, Kisarazu, Chiba 292-0812, Japan 
(E-mail : genomics@hri.co.jp, Tel : 81-4 3 8-52-3975, Fax : 8 1-4 38-52- 

NEDO human cDNA sequencing project supported by Ministry of 
International Trade and Industry of Japan; cDNA full insert 
sequencing: Research Association for Biotechnology; cDNA library 
construction, 5'- & 3 '-end one pass sequencing and clone 

Helix Research Institute (supported by Japan Key Technology Center 
etc.) and Department of Virology, Institute of Medical Science, 
University of Tokyo. 

Location/Qualifiers 
1. .2338 

/organism="Homo sapiens" 
/mol_type="mRNA" 
/db_xref="taxon: 9606" 
/clone= n HEMBB100 08 07" 

/tissue_type="whole embryo, mainly body" 
/clone_lib="HEMBBl" 
/dev_stage=" embryo, 10 weeks" 
/note="cloning vector: pME18SFL3" 
CDS 99. .1685 

/note="unnamed protein product" 
/ codon_start=l 
/protein__id="BAA91509 . 1" 
/db xref="GI: 7022174" 



TITLE 
JOURNAL 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



3986) 
COMMENT 



selection : 



FEATURES 

source 



/trans la tion="MFGAAGRQPIGAPAAGNSWHFSRTMEELVHDLVSALEESSEQAR 



GGFAETGDHS RS I SCPLKRQARKRRGRKRRSYNVHHPWETGHCLSEGSDS SLEEPS KD 



YRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWHESDFAVDNVGNRTLRR 



RRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEFTKNKVKKRKLKIIR 

QGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLSSTDAGLFTNDEG 

RQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFESILTGSFPLMS 

HPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPDSHHHDHWF 

SPGARTEHDQHQLLRDNRAERGHKKNCSVRTAS RQTSMHLGSLCTGDIKRRRKAAPLP 

GPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPKGLGL 

GFPLPKSTSATTTPNAGKSA" 

ORIGIN 

Alignment Scores: 

Pred. No.: 5.58e-138 Length: 2338 

Score: 2694.00 Matches: 504 

Percent Similarity: 100.00% Conservative: 0 

Best Local Similarity: 100.00% Mismatches: 0 

Query Match: 100.00% Indels : 0 

DB: 8 Gaps: 0 

US-09-771-312-2 (1-504) x AK001114 (1-2338) 

Qy 1 MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAlaArg 20 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 171 AT GGAGGAGCT GGTT CAT GAC CTT GT CT CAGCATT GGAAGAGAGCT CAGAGCAAGCT C GA 23 0 

Qy 21 GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 40 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I II M I I I M I I I I I I I I I I I I I I I I I I! I 
Db 231 G GT G GAT T T G CT GAAAC AG GAGAC CAT T CT C GAAGT AT AT CTTGCCCTCT GAAAC GC CAG 290 

Qy 41 AlaArgLysArgArgGlyArgLysArgArgSerTyrAsnValHisHisProTrpGluThr 60 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I II I I I I II I I 
Db 291 G CAAG GAAAAG GAGAG G GAGAAAAC G GAG GT C GT AT AAT GT G CAT CAC C C GT G GGAGAC T 350 

Qy 61 GlyHisCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArg 80 

I I I I I I I I I I M I I I I I I I I I I I I I I ! I M I I I I II I I I I I I I I I I I I II I I I I I I I I I I 
Db 351 GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 410 

Qy 81 GluAsnHisAsnAsnAsnLysLysAspHisSerAspSerAspAspGlnMetLeuValAla 10 0 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 411 GAGAAT C ACAATAAT AATAAAAAAGAT CACAGT GACT C T GAT GAC CAAAT GT T AGT AGCA 47 0 

Qy 101 LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I M I I I I I I I I II I I II II I I I I I 
Db 471 AAG C G CAG G C C GT CAT CAAACT T AAAT AAT AAT GT T C GAGG GAAAAGAC C T CTAT GGCAT 530 

Qy 121 GluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 14 0 

I I I I I I I I I I I I I II I I I I I I I I I I M I II I I I I I I I I I I I I M I I I I I I I I I I I M I I I 
Db 531 GAGT C T GAT T T T GC T GT GGACAAT GT T G G GAAT AGAACT CT GC GCAG GAGGAGAAAG GT A 590 

Qy 141 LysArgMetAlaValAspLeuProGlnAspIleSerAsnLysArgThrMetThrGlnPro 160 



Db 


591 


Qy 


161 


Db 


651 


Qy 


181 


Db 


711 


Qy 


201 


Db 


771 


Qy 


221 


Db 


831 


Qy 


241 


Db 


891 


Qy 


261 


Db 


951 


1010 




Qy 


281 


Db 


1011 


107 0 




Qy 


301 


Db 


1071 


1130 




Qy 


321 


Db 


1131 


1 1 Qft 

X X Z? \J 




Qy 


341 


Db 


1191 






Qy 


361 


Db 


1251 


1310 




Qy 


381 


Db 


1311 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

AAAC G CAT G GCAGT AGAT CT C C C ACAG GACAT CT CTAACAAAC G GACAAT GAC C CAG C CA 650 

ProGluGlyCysArgAspGlnAspMetAspSerAspArgAlaTyrGlnTyrGlnGluPhe 18 0 
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I 
CCTGAGGGTTGTAGAGATCAGGACATGGACAGTGATAGAGCCTACCAGTATCAAGAATTT 710 

ThrLysAsnLysValLysLysArgLysLeuLysIlelleArgGlnGlyProLysIleGln 200 
I I I I I M I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II I I I I I I I I I II I I I I I 
AC C AAGAAC AAAGT C AAAAAAAGAAAGT T GAAAAT AAT C AGAC AAG GAC CAAAAAT C C AA 77 0 



I I M I I I I I I I I I II I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I I I I I II I I II I 

GAT GAAG GAGT AGT T T T AGAAAGT GAG GAAAC GAAC C AGAC C AAT AAG GACAAAAT G GAA 830 

CysGluGluGlnLysValSerAspGluLeuMetSerGluSerAspSerSerSerLeuSer 24 0 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
TGTGAAGAGCAAAAAGTCTCAGATGAGCTCATGAGTGAAAGTGATTCCAGCAGTCTCAGC 890 

SerThrAspAlaGlyLeuPheThrAsnAspGluGlyArgGlnGlyAspAspGluGlnSer 2 60 
I II I I I M M I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I II II I I I I I I I I I I 
AGCACTGATGCTGGATTGTTTACCAATGATGAGGGAAGACAAGGTGATGATGAACAGAGT 950 

AspTrpPheTyrGluLysGluSerGlyGlyAlaCysGlylleThrGlyValValProTrp 28 0 
I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTTGTGCCCTGG 



TrpGluLysGluAspProThrGluLeuAspLysAsnValProAspProValPheGluSer 300 
I M I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II 
TGGGAAAAGGAAGATCCTACTGAGCTAGACAAAAATGTACCAGATCCTGTCTTTGAAAGT 



IleLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGlnAlaArg 320 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I 
ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 



LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 34 0 
I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I M I I I I I I II I I I I I I I I I I I I I I I I I I 
CTCAGTCGCCTTCATGGAATGTCTTCAAAGAATATTAAAAAATCTGGAGGGACTCCAACT 



SerMetValProIleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 
I I I I I I I II I I I I I I I I I M I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 



SerHisHisHisAspHisTrpPheSerProGlyAlaArgThrGluHisAspGlnHisGln 380 
I I I I I I I I I I I I I I I II I I I M I I I I I II I I I I I I I I I I I II II I I I I I I I I I I I I I I I I 
TCTCATCACCATGACCATTGGTTTAGCCCTGGGGCTAGGACAGAGCATGACCAGCATCAG 



LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerValArgThrAla 4 00 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I II I I I I I I I II I 
C T T CT GAGAGAT AAT C G AGC T GAAAGAG GAC AC AAGAAAAAT T GT T C T GT GAGAAC AG C C 



1370 



Qy 

Db 

1430 



4 01 SerArgGlnThrSerMetHisLeuGlySerLeuCysThrGlyAspIleLysArgArgArg 42 0 
I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1371 AGCAGGCAAACAAGCATGCATTTAGGATCCTTATGCACGGGAGATATCAAACGGAGAAGA 



Qy 

Db 

1490 



421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 
I I I I II I I I M I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1431 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 



440 



Qy 

Db 

1550 



441 IleLeuGluAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 
I I I I II I I I I I I I I I I I I I I I I I II I I I I I M I I I I I I I I I I I I I M I I I I I II I II I II 
14 91 AT C C TAGAAAATAAT AT T GGAAAC C GAAT G CT T CAGAAT AT GG GCT G GAC G C CT G GGT C A 



460 



Qy 

Db 

1610 



4 61 GlyLeuGlyArgAspGlyLysGlylleSerGluProIleGlnAlaMetGlnArgProLys 4 80 

I I I I I I I I I I I I II I I I I I I I I I I I M I I I I I I I II I I I I I I I I II I I I I II I I M I I I I 
1551 GG C C T T G GAC GAGAT GGCAAG GG GAT CT CT GAGC CAATT CAAGC C AT GCAGAG GC CAAAG 



Qy 

Db 

1670 



4 81 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 
I I I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I I I I M I I I I I 
1611 G GAT TAG GAC T T G GAT TTCCTCTAC CAAAAAGT AC T T C C GCAAC T AC T AC C C C C AAT G C A 



500 



Qy 

Db 



501 GlyLysSerAla 504 

I I 1 I I I I I I I I I 
1671 GGAAAATCCGCC 1682 



RESULT 4 
AX405697 
LOCUS 
2002 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 
Ren, F. , 

TITLE 
JOURNAL 

FEATURES 

source 



AX405697 



2344 bp 



DNA 



linear PAT 14-JUN- 



Sequence 112 from Patent WO0222660. 
AX405697 

AX405697 .1 GI : 2143 8833 

Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Euarchontoglires ; Primates; Catarrhini; 
Hominidae; Homo. 
1 

Tang,Y.T., Liu,C, Zhou, P., Asundi,V., Zhang, J., Zhao,Q.A., 

Xue,A.J., Yang,Y., Wehrman,T. and Drmanac,R.T. 
Novel nucleic acids and polypeptides 
Patent: WO 0222660-A 112 21-MAR-2002; 
HYSEQ, INC. (US) 

Location/Qualifiers 
1. .2344 

/organism="Homo sapiens" 
/mol_type= n unassigned DNA" 
/db xref="taxon: 9606" 



CDS 98. .1684 

/note="unnamed protein product" 
/codon_start=l 
/protein_id="CAD34804 .1" 
/db_xref="GI : 21438834 " 

/ trans la tion="MFGAAGRQPIGAPAAGNSWHFSRTMEELVHDLVSALEESSEQAR 

GGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWETGHCLSEGSDSSLEEPSKD 

YRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWHESDFAVDNVGNRTLRR 

RRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEFTKNKVKKRKLKIIR 

QGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLSSTDAGLFTNDEG 

RQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFESILTGSFPLMS 

HPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPDSHHHDHWF 

SPGARTEHDQHQLLRDNRAERGHKKNCSVRTAS RQTSMHLGSLCTGDIKRRRKAAPLP 

GPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPKGLGL 

GFPLPKSTSATTTPNAGKSA" 

ORIGIN 

Alignment Scores: 



Pred. No. : 


5.59e-138 


Length : 


2344 


Score : 


2694.00 


Matches : 


504 


Percent Similarity: 


100. 00% 


Conservative : 


0 


Best Local Similarity: 


100. 00% 


Mismatches : 


0 


Query Match: 


100. 00% 


Indels : 


0 


DB: 


6 


Gaps : 


0 



US-09- 


-771-312- 


-2 (1-504) x AX405697 (1-2344) 




Qy 


1 


MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAlaArg 


20 






1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


170 


AT GGAG GAG CT GGT T CAT GAC C T T GT CT CAG CAT T GGAAGAGAG CT CAGAG CAAG C T C GA 


229 


Qy 


21 


GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 


40 






1 1 1 I 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1! 




Db 


230 


G GT G GAT T T G C T GAAAC AG GAGAC CAT T CT C GAAGT ATAT CTTGCCCTCT GAAAC G C CAG 


289 


Qy 


41 


AlaArgLysArgArgGlyArgLysArgArgSerTyrAsnValHisHisProTrpGluThr 


60 






1 1 i 1 1 t 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 M 1 II 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 




Db 


290 


G CAAG GAAAAG GAGAG G GAGAAAAC G GAG GT C GT AT AAT GT G CAT CAC C C GT G G GAGACT 


349 


Qy 


61 


GlyHisCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArg 


80 






1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 




Db 


350 


GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 


409 


Qy 


81 


GluAsnHisAsnAsnAsnLysLysAspHisSerAspSerAspAspGlnMetLeuValAla 


100 






II 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 




Db 


410 


GAG AAT C ACAAT AAT AAT AAAAAAGAT C AC AGT GACT C T GAT GAC CAAAT GT T AGT AG CA 


469 



Qy 


101 


LysArgArgProSerSerAsnLeuAsnAs nAs nValArgGlyLysArgProLeuTrpHis 


120 






II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


470 


AAG C GC AG G C C GT CAT C AAACT T AAAT AAT AAT GT T C GAG G GAAAAGAC CT C TAT G G CAT 


529 


Qy 


121 


GluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 


140 






1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 It 1 It I 1 1 1 1 1 




Db 


530 


GAGTCTGATTTTGCTGTGGACAATGTTGGGAATAGAACTCTGCGCAGGAGGAGAAAGGTA 


589 


Qy 


141 


LysArgMetAlaValAspLeuProGlnAspIleSerAsnLysArgThrMetThrGlnPro 


160 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

■ 1 1 I 1 1 1 1 1 1 1 1 1 1 J 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 J 1 1 I I | ] I I J | | | | | | | | | | | 




Db 


590 


AAAC G CAT G G C AGT AGAT C T C C C ACAG GACAT CT C T AAC AAAC GGACAAT GAC C C AGC C A 


649 


Qy 


161 


ProGluGlyCysArgAspGlnAspMetAspSerAspArgAlaTyrGlnTyrGlnGluPhe 


180 






1 1 1 II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I.I 1 1 1 1 1 1 1 1 1 11 1 1 1 
■ i i i i i i i t i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i j i i i i I ' i i i i i i i i i i t i i i i 




Db 


650 


C C T GAG G GT T GT AGAGAT C AG GACAT G GAC AGT GAT AGAG C C T AC C AGT AT C AAGAAT T T 


709 


Qy 


181 


ThrLysAsnLysValLys LysArgLys LeuLys IlelleArgGlnGlyProLys I leGln 


200 






1 1 1 M M 1 1 M M 1 1 1 1 1 1 1 1 II 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 

1 1 1 1 1 1 < 1 1 i i II 1 1 1 II 1 1 M II 1 1 1 II 1 1 I I 1 1 I 1 I M M 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 II 




Db 


710 


AC CAAGAACAAAGT C AAAAAAAGAAAGT T GAAAATAAT C AGACAAG GAC CAAAAAT C CAA 


769 


Qy 


201 


AspGluGlyValValLeuGluSerGluGluThrAsnGlnThrAsnLysAspLysMetGlu 


220 






1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 M 1 
■ i i i i i i i i i i i i i i i i i i j i i i i i i i i i i i i i i i i i i i i i i I i i i i i i i j i i i i i i i i i 




Db 


770 


GAT GAAG GAGT AGT T T T AGAAAGT GAG GAAAC GAAC C AGAC C AAT AAG GAC AAAAT G GAA 


829 


Qy 


221 


CysGluGluGlnLysValSerAspGluLeuMetSerGluSerAspSerSerSerLeuSer 


240 






1 1 1 I 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1! 1 1 
■ i t i i i i i i i i i i i i i i i i t i i i i i i i i i i i i i i i i i i i i i j i i i i i i i i i i i i i i i { i i 




Db 


830 


TGTGAAGAGCAAAAAGTCTCAGATGAGCTCATGAGTGAAAGTGATTrCAGrA(^TrTrA(^r 


ft ft Q 


Qy 


241 


SerThrAspAlaGlyLeuPheThrAsnAspGluGlyArgGlnGlyAspAspGluGlnSer 


260 






1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 




Db 


890 


AGCACTGATGCTGGATTGTTTACCAATGATGAGGGAAGACAAGGTGATGATGAACAGAGT 


949 


Qy 


261 


AspTrpPheTyrGluLysGluSerGlyGlyAlaCysGlylleThrGlyValValProTrp 


280 






1 1 1 1 1 1 t 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 II 1 1 1 1 1 




Db 


950 


GACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTTGTGCCCTGG 




1009 









Qy 281 TrpGluLysGluAspProThrGluLeuAspLysAsnValProAspProValPheGluSer 300 

I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I M I I I II 
Db 1010 TGGGAAAAGGAAGATCCTACTGAGCTAGACAAAAATGTACCAGATCCTGTCTTTGAAAGT 

1069 



Qy 301 IleLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGlnAlaArg 320 

I I 1 I I I I I I I I I I I I I I I I I II I I I II I I I I I II I I II I I I II I I I I I I I I I I I I I I I I I 
Db 107 0 ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 

1129 



Qy 321 LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 340 

I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
Db 113 0 CTCAGTCGCCTTCATGGAATGTCTTCAAAGAATATTAAAAAATCTGGAGGGACTCCAACT 

1189 



Qy 341 SerMetValProIleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 

I I I I I I I I I I M I I I I I II I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 119 0 TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 

1249 



Qy 

Db 

1309 



361 SerHisHisHisAspHisTrpPheSerProGlyAlaArgThrGluHisAspGlnHisGln 
II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
1250 TCTCATCACCATGACCATTGGTTTAGCCCTGGGGCTAGGACAGAGCATGACCAGCATCAG 



380 



Qy 

Db 

1369 



381 LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerValArgThrAla 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1310 CT T C T GAGAGAT AAT C GAG C T GAAAGAG GAC AC AAGAAAAAT T GT T CT GT GAGAAC AGC C 



400 



Qy 

Db 

1429 



401 SerArgGlnThrSerMetHisLeuGlySerLeuCysThrGlyAspIleLysArgArgArg 

I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
137 0 AG C AG G C AAAC AAG CAT G CAT T TAG GAT C CT T AT G C AC GG GAGAT AT C AAAC GGAGAAGA 



420 



Qy 

Db 

1489 



421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 
I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
143 0 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 



440 



Qy 

Db 

1549 



441 IleLeuGluAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 

I I M I I I I I I I I I I I I I I M I I I I I II I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
14 9 0 ATCCTAGAAAATAATATTGGAAACCGAATGCTTCAGAATATGGGCTGGACGCCTGGGTCA 



460 



Qy 

Db 

1609 



461 GlyLeuGlyArgAspGlyLysGlylleSerGluProIleGlnAlaMetGlnArgProLys 

I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I II I I II I I I I I I I I I I I I I ! I I I I I I I 
1550 GGCCTTGGACGAGATGGCAAGGGGATCTCTGAGCCAATTCAAGCCATGCAGAGGCCAAAG 



480 



Qy 

Db 

1669 



481 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 500 

I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I II I I I I I I I I I I M II I I I I I I I I 
1610 GGATTAGGACTTGGATTTCCTCTACCAAAAAGTACTTCCGCAACTACTACCCCCAATGCA 



Qy 501 GlyLysSerAla 504 

I I I I I 11 I I I I I 
Db 1670 GGAAAATCCGCC 1681 



RESULT 5 
AX206855 
LOCUS 
2001 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 

SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



AX206855 

Sequence 1 
AX206855 
AX206855. 1 



2345 bp 
from Patent WO0155391, 
GI: 15394681 



DNA 



linear 



PAT 30-AUG- 



Homo sapiens (human) 
Homo sapiens 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Euarchontoglires ; Primates; Catarrhini; 
Hominidae; Homo. 
1 

Jakobovits, A. , Afar, D. E. , Challita-Eid, P.M., Levin, E. , 
Mitchell, S.C. and Hubert, R.S. 



TITLE 84p2a9: a prostate and testis specific protein highly expressed in 

prostate cancer 
JOURNAL Patent: WO 0155391-A 1 02-AUG-2001; 
Urogenesys, Inc. (US) 
FEATURES Location/Qualifiers 
source 1. .2345 

/organism="Homo sapiens 1 ' 
/mol_type="unassigned DNA" 
/db_xref= n taxon:9606" 
CDS 163. .1677 

/note="unnamed protein product" 
/codon_start=l 
/protein_id="CAC60223. 1" 
/db_xref="GI: 15394682" ' 

/translation^ "MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKR 

RGRKRRSYNVHHPWETGHCLSEGSDS SLEEPS KDYRENHNNNKKDHSDSDDQMLVAKR 

RPSSNLNNNVRGKRPLWHESDFAVDNVGNRTLRRRRKVKRiyiAVDLPQDISNKRTMTQP 

PEGCRDQDMDSDRAYQYQEFTKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDK 

MECEEQKVSDELMSESDSSSLSSTDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITG 

WPWWEKEDPTELDKNVPDPVFESILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKK 

SGGTPTSMVPIPGPVGNKRMVHFSPDSHHHDHWFSPGARTEHDQHQLLRDNRAERGHK 

KNCSVRTASRQTSMHLGSLCTGDIKRRRKAAPLPGPTTAGFVGENAQPILENNIGNRM 

LQNMGWTPGSGLGRDGKGISEPIQAMQRPKGLGLGFPLPKSTSATTTPNAGKSA" 

ORIGIN 

Alignment Scores: 

Pred. No.: 5.6e-138 Length: 2345 

Score: 2694.00 Matches: 504 

Percent Similarity: 100.00% Conservative: 0 

Best Local Similarity: 100.00% Mismatches: 0 

Query Match: 100.00% Indels : 0 

DB: 6 Gaps: 0 

US-09-771-312-2 (1-504) x AX206855 (1-2345) 

Qy 1 MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAlaArg 20 

i I I I I I I M I I I I II I I I I M I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I 
Db 163 AT G GAG GAG CT G GT T CAT GAC C T T GT CT CAGC ATT G GAAGAGAG CT CAGAGCAAGC T C GA 222 

Qy 21 GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 40 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 223 GGTGGATTTGCTGAAACAGGAGACCATTCTCGAAGTATATCTTGCCCTCTGAAACGCCAG 2 82 

Qy 41 AlaArgLysArgArgGlyArgLysArgArgSerTyrAsnValHisHisProTrpGluThr 60 

I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I M I I I M I I I I I I I I I I II I I II I I I I I 
Db 2 83 G C AAG G AAAAG GAG AG G G AGAAAAC G GAG GT C GT AT AAT GT G CAT C AC C C GT G G GAG ACT 342 



Qy 



61 GlyHisCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArg 8 0 



Db 



I I I! I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I 
343 GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 4 02 



Qy 


81 


Db 


403 


Qy 


101 


Db 


463 


Qy 


121 


Db 


523 


Qy 


141 


Db 


583 


Qy 


161 


Db 


643 


Qy 


181 


Db 


703 


Qy 


201 


Db 


763 


Qy 


221 


Db 


823 






Db 


883 


Qy 


261 


Db 


943 


1002 




Qy 


281 


Db 


1003 


1062 




Qy 


301 


Db 


1063 


1122 




Qy 


321 



GluAsnHisAsnAsnAsnLysLysAspHisSerAspSerAspAspGlnMetLeuValAla 100 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GAGAAT CACAATAATAAT AAAAAAGAT CACAGT GACT CT GATGAC CAAAT GTTAGTAGCA 4 62 

LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AAGCGCAGGCCGTCATCAAACTTAAATAATAATGTTCGAGGGAAAAGACCTCTATGGCAT 522 

GluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 14 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II 
GAGTCTGATTTTGCTGTGGACAATGTTGGGAATAGAACTCTGCGCAGGAGGAGAAAGGTA 582 

LysArgMetAlaValAspLeuProGlnAspIleSerAsnLysArgThrMetThrGlnPro 160 

II I I I I I I I I I I I I I I I I I I I I I I I I i I I II I I I I M I I I I I I I I I I I I I II I I I I I I II 
AAAC G CAT G G CAGT AGAT C T C C C ACAG GAC AT C T CT AAC AAAC G GAC AAT GAC C C AG C C A 642 

ProGluGlyCysArgAspGlnAspMetAspSerAspArgAlaTyrGlnTyrGlnGluPhe 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

C CT GAG G GT T GT AGAGAT CAGG AC AT G GAC AGT GAT AGAG C CT AC CAGT AT C AAGAAT T T 7 02 

ThrLysAsnLysValLysLysArgLysLeuLysIlelleArgGlnGlyProLysIleGln 200 
I II I I I I I I I I I I I I I I I I I I I II I I I II I I I I I I I I I I! I I I I I I I I I I I I I I I I I I I I 
AC C AAGAAC AAAGT C AAAAAAAGAAAGT T GAAAAT AAT C AG AC AAG GAC C AAAAAT C C AA 7 62 

AspGluGlyValValLeuGluSerGluGluThrAsnGlnThrAsnLysAspLysMetGlu 22 0 
I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I 
GAT G AAG GAGT AGT T T T AGAAAGT GAGGAAAC GAAC C AGAC C AAT AAG GAC AAAAT G GAA 822 

CysGluGluGlnLysValSerAspGluLeuMetSerGluSerAspSerSerSerLeuSer 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I M 

T GT GAAGAG C AAAAAGT CT C AGAT GAG C T CAT GAGT GAAAGT GAT T C C AGC AGT CT C AG C 8 82 

SerThrAspAlaGlyLeuPheThrAsnAspGluGlyArgGlnGlyAspAspGluGlnSer 260 
I I I I I I II I I I I I I I I I I I I II I I I I I I I I II I I I I I I I I II II I I I I I I I I I I I I I I I I 
AG C ACT GAT G CT G GAT T GT T T AC C AAT GAT GAG G GAAG AC AAG GT GAT GAT GAAC AGAG T 9 4 2 



280 



I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
GACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTTGTGCCCTGG 



TrpGluLysGluAspProThrGluLeuAspLysAsnValProAspProValPheGluSer 300 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I II I I I I M I I II I I I I I I I I I I 
T G G GAAAAG GAAGAT C C TACT GAG C T AGAC AAAAAT GT AC C AG AT CCTGTCTTT GAAAGT 



IleLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGlnAlaArg 320 
I I I I I I I I I II I I II I I I II I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I M I I 
ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 



LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 34 0 
I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 



Db 

1182 



1123 CT CAGT C G C C T T CAT G GAAT GT C T T CAAAGAATAT TAAAAAAT CT G GAGG GACT C CAACT 



Qy 

Db 

1242 



341 SerMetValProIleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 
I I I I I M I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
1183 T CAAT GGT AC C CAT TCCTGGCC CAGT G G GTAACAAGAGAAT G GT T CAT T T T T C C C CGGAT 



Qy 

Db 

1302 



3 61 SerHisHisHisAspHisTrpPheSerProGlyAlaArgThrGluHisAspGlnHisGln 380 
I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I 
1243 T C T CAT C AC CAT GAC CAT T G GT T TAG CCCTGGGGC TAG GAC AGAG CAT GAC C AG CAT C AG 



Qy 

Db 

1362 



381 LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerValArgThrAla 
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I M I I I I I I I II I I I I II I I I I I I II I I 
1303 CTT CT GAGAGATAAT CGAGCT GAAAGAGGACACAAGAAAAATT GTT CT GT GAGAACAGC C 



400 



Qy 

Db 

1422 



4 01 SerArgGlnThrSerMetHisLeuGlySerLeuCysThrGlyAspIleLysArgArgArg 42 0 
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I II I I I I M I II I I I I I I I I I I I II I I I 
1363 AGCAGGCAAACAAGCAT GCATTTAGGATCCTTAT GCACGGGAGATAT CAAACGGAGAAGA 



Qy 

Db 

1482 



421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 440 
I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I II 
1423 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 



Qy 

Db 

1542 



441 IleLeuGluAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 4 60 
I I I I I I M I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 
14 8 3 AT C CTAGAAAATAAT AT T G GAAAC C GAAT G C T T CAGAAT AT G G GCT G GAC G C CT GGGT CA 



Qy 

Db 

1602 



4 61 GlyLeuGlyArgAspGlyLysGlylleSerGluProIleGlnAlaMetGlnArgProLys 
I I i I I I I I I I II I I I I I I I I I I I I I I I M I t I I I M I I I I I I I I I I I I I I I I I I I I II I I 
1543 GGCCTTGGACGAGATGGCAAGGGGATCTCTGAGCCAATTCAAGCCATGCAGAGGCCAAAG 



480 



Qy 

Db 

1662 



481 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 
I I II M I I I I I II I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I I I I M I M I I I I I 
1603 GGATTAGGACTTGGATTTCCTCTACCAAAAAGTACTTCCGCAACTACTACCCCCAATGCA 



500 



Qy 

Db 



501 GlyLysSerAla 504 
I I I I I I I I I I I I 
1663 GGAAAATCCGCC 1674 



RESULT 6 
BC054810 
LOCUS 
2004 

DEFINITION 

ACCESSION 

VERSION 

KEYWORDS 



BC054810 



4537 bp 



rnRNA 



linear 



ROD 30-JUN- 



Mus mus cuius G patch domain containing 2, rnRNA (cDNA clone 

MGC: 65681 IMAGE : 6839419) , complete cds . 

BC054810 

BC054810.1 GI:32452009 
MGC. 



SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 

Schuler, G. D. 



TITLE 

JOURNAL 
PUBMED 
REFERENCE 
AUTHORS 
TITLE 
JOURNAL 



REMARK 
COMMENT 



Mus musculus (house mouse) 
Mus musculus 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
Mammalia; Eutheria; Euarchontoglires ; Glires; Rodentia; 
Sciurognathi; Muroidea; Muridae; Murinae; Mus. 

1 (bases 1 to 4537) 

Strausberg, R. L. , Feingold, E.A. , Grouse, L. H. , Derge, J. G. , 
Klausner, R. D. , Collins, F. S . , Wagner, L. , Shenmen, CM. , 

Altschul, S . F. , Zeeberg,B., Buetow,K.H., Schaef er , C . F . , Bhat,N.K., 
Hopkins, R. F. , Jordan, H., Moore, T., Max,S.I., Wang, J., Hsieh,F., 
Diatchenko, L. , Marusina,K., Farmer, A. A. , Rubin, G.M., Hong,L., 
Stapleton, M. , Soares, M.B., Bonaldo, M. F. , Casavant , T . L . , 
Scheetz, T . E . , Brownstein, M. J . , Usdin,T.B., Toshiyuki , S . , 
Carninci,P., Prange,C, Raha,S.S., Loquellano, N . A. , Peters, G. J., 
Abramson, R. D. , Mullahy, S . J. , Bosak, S . A. , McEwan, P . J. , 
McKernan, K. J. , Malek, J . A. , Gunaratne, P . H. , Richards , S . , 
Worley,K.C, Hale,S., Garcia, A.M., Gay,L.J., Hulyk,S.W., 
Villalon, D. K. , Muzny, D .M. , Sodergren, E . J . , Lu,X., Gibbs,R.A., 
Fahey,J., Helton, E., Ketteman,M., Madan,A. , Rodrigues , S . , 
Sanchez, A., Whiting, M. , Madan,A., Young, A. C, Shevchenko, Y . , 
Bouf f ard, G. G. , Blakesley, R. W. , Touchman, J.W. , Green, E. D. , 
Dickson, M. C. , Rodriguez, A. C . , Grimwood,J., Schmutz,J., Myers, R.M., 
Butter field, Y . S . , Krzywinski, M. I . , Skalska, U . , Smailus, D . E . , 
Schnerch,A., Schein,J.E., Jones, S.J. and Marra,M.A. 
Generation and initial analysis of more than 15,000 full-length 
human and mouse cDNA sequences 

Proc. Natl. Acad. Sci. U.S.A. 99 (26), 16899-16903 (2002) 
12477932 

2 (bases 1 to 4537) 
Strausberg, R. 
Direct Submission 

Submitted ( 01- JUL-2003 ) National Institutes of Health, Mammalian 
Gene Collection (MGC) , Cancer Genomics Office, National Cancer 
Institute, 31 Center Drive, Room 11A03, Bethesda, MD 20892-2590, 
USA 

NIH-MGC Project URL: http://mgc.nci.nih.gov 
Contact: MGC help desk 
Email: cgapbs-r@mail.nih.gov 

Tissue Procurement: Dr. James Lin, University of Iowa 

cDNA Library Preparation: M. Bento Soares, University of Iowa 

cDNA Library Arrayed by: The I.M.A.G.E. Consortium (LLNL) 
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source 1. .4537 

/organism="Mus musculus" 

/mol_type="mRNA" 

/strain="C57BL/6" 

/db_xref="taxon: 10090" 

/clone="MGC: 65681 IMAGE: 6839419" 

/tissue_type="Brain" 

/ cl one_l ib= " NI H_BMAP_GH 0 " 

/lab_host="DH10B" 

/note="Vector: pYX-ASC" 
gene 1. .4537 

/gene="Gpatc2" 

/db_xref="GeneID: 67769" 

/ cib_xref ="MGI : 1915019" 
CDS 86. .1669 

/gene="Gpatc2" 

/codon_start=l 

/product="G patch domain containing 2" 
/protein_id="AAH54810. 1" 
/db_xref="GI: 32452010" 
/db_xref="GeneID: 67769" 
/db_xref="MGI: 1915019" 

/ trans la tion="MFGADGRPAIGTAAGKSWHFSRTMEELVHDLVSALEESSEQARG 

GFAETGEHSRNLSCPLKRQARKRRGRKRRSYNVHHPWETGHCLSEGSDSSLEEPSKDY 

REKHSNNKKDRSDSDDQMLVAKRRPSSNLSSSVRGKRLLWHESDFAVDSLGNRTLRRR 

RKVKRMAVDLPQDVSSKRTMTQLPEGCRDQDMDNDRASQYPEFTRKKVKKRKLKGIRP 

GPKTQEEGGVLESEERSQPNKDRMEYEEQKASDELRSESDTSSLSSTDAGLFTNDEGR 

QGDDEQSDWFYEKESGGACGIAGWPWWEKDEPAELDTNLPDPVFESILSGSFPLMSH 

PGRGGFQARLSRLHGTPSKNIKKSSGAPPSMLPAPGPGSNKRMVHFSPDAHRHDHWFS 

PGARTEHGQHQLLRDNRAERGHKKSCSLKTASRQTSMHLGSLCTGDIKRRRKAAPLPG 

PTAAGIVGENAQPILESNIGNRMLQSMGWTPGSGLGRDGRGIAEPVQAVQRPKGLGLG 

FPLPKSSPTSPAPTSGNPA" 

ORIGIN 

Alignment Scores: 



Pred. No . : 


3.54e-117 


Length : 


4537 


Score : 


2318. 00 


Matches : 


429 


Percent Similarity: 


92.06% 


Conservative : 


35 


Best Local Similarity: 


85. 12% 


Mismatches : 


40 


Query Match: 


86. 04% 


Indels : 


0 


DB: 


9 


Gaps : 


0 



US-09-771-312-2 (1-504) x BC054810 (1-4537) 



Qy 1 MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAlaArg 20 

I I II I I I I I II I I I I I I II I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 155 AT G GAAGAG CT T GT T CACGAT CT T GT CT CT G CACT GGAAGAGAG C T CT GAG CAAG C C C GA 214 



Qy 21 GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 40 

I I I I I I I I I I I I I I I I I I I I I ::: I I II I I I I I :::::: I I I I I I I I I I I I I I I I I I I I I 
Db 215 GGTGGATTTGCTGAAACTGGAGAACATTCTCGAAATCTGTCTTGCCCTCTGAAACGCCAG 274 

Qy 41 AlaArgLysArgArgGlyArgLysArgArgSerTyrAsnValHisHisProTrpGluThr 60 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I 
Db 275 G C C C GGAAAAG G AGAG G GAG GAAGC G GAG AT C CT AC AAT GT T C AC C AC C C GT G G GAGAC A 334 

Qy 61 GlyHisCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArg 80 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I 
Db 335 GGC CACT G CT T AAGT GAAG G CT CT GAT T CTAGT T TAGAAGAAC CAAGTAAG GACTAT AG A 394 

Qy 81 GluAsnHisAsnAsnAsnLysLysAspHisSerAspSerAspAspGlnMetLeuValAla 100 

Ml I I I ::: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 395 GAGAAGCAC AGCAATAAT AAAAAG GAC C G CAGT GAC T CT GAT GAC CAGAT GTT AGT GG C G 454 

Qy 101 LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 120 

I I I I I I I I I I I I I I I I I I I I I I I I ::::::::: I I I I I I I I I I I I I I I I I I I I I M I 
Db 455 AAGCGGAGGCCATCTTCAAACCTAAGCAGCAGCGTTCGAGGTAAGCGGCTTCTGTGGCAC 514 

Qy 121 GluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 140 

I I I I I I M I I I I I I I I I I I I I :::::: I I I I I II I I M I I I I II I I I I I I I I I I I I I I I I 
Db 515 GAGTCTGACTTTGCCGTGGACAGCCTTGGGAACAGAACGCTGCGCCGGAGGAGGAAGGTG 574 

Qy 141 LysArgMetAlaValAspLeuProGlnAspIleSerAsnLysArgThrMetThrGlnPro 160 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I ::: I I I ::: I I II I I I I I I I I I I I I I I 
Db 575 AAGCGCATGGCCGTGGATCTCCCGCAGGACGTCTCCAGCAAAAGGACAATGACCCAGCTG 634 

Qy 161 ProGluGlyCysArgAspGlnAspMetAspSerAspArgAlaTyrGlnTyrGlnGluPhe 180 

' I I I I I I I I I I I I I I I I I I I I I I I I I I I I II ::: I I I I I I I I I I I I I I I I I I I I I 
Db 635 CCAGAAGGCT GCAGAGAT CAGGACATGGACAAT GATAGAGCCAGC CAGTAT C CAGAGTT T 694 

Qy 181 ThrLysAsnLysValLysLysArgLysLeuLysIlelleArgGlnGlyProLysIleGln 200 

III::: I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I II I I III 
Db 695 AC C C G GAAGAAAGT TAAGAAAAG GAAGT T GAAAG GGAT TAG G C CAG GAC C GAAAAC C CAG 754 

Qy 201 AspGluGlyValValLeuGluSerGluGluThrAsnGlnThrAsnLysAspLysMetGlu 220 

: : : I I M I I I I I I I I I I I I I I I I I I I I : : : I I I I I I 1 I I I I I ::: I I I I II 
Db 755 GAG GAAG GAG GAGT T TT G GAGAGT GAAGAAAGAAGC CAG C C CAACAAG GAC AG GAT G GAG 814 

Qy 221 CysGluGluGlnLysValSerAspGluLeuMetSerGluSerAspSerSerSerLeuSer 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II ::: I I I I I I I I I I I I 
Db 815 TACGAGGAACAGAAAGCCTCGGATGAGCTCAGGAGCGAAAGTGACACCAGCAGTCTCAGC 874 

Qy 241 SerThrAspAlaGlyLeuPheThrAsnAspGluGlyArgGlnGlyAspAspGluGlnSer 260 

I M II I I I I I I I I II I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 87 5 AG CAC T GAC GCGGGCTTGTT C AC CAAC GAT GAGG GAAGACAAG GT GAT GAT GAG CAGAGT 934 

Qy 261 AspTrpPheTyrGluLysGluSerGlyGlyAlaCysGlylleThrGlyValValProTrp 280 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
Db 935 GACTGGTTCTATGAGAAGGAGTCAGGCGGAGCGTGCGGGATTGCTGGAGTCGTGCCCTGG 9 94 

Qy 281 TrpGluLysGluAspProThrGluLeuAspLysAsnValProAspProValPheGluSer 300 

I I I I I M I I :::::: I I I I I M I I I II I I I : : : I I I I I I I I I I I I I I I I I I I M 
Db 9 95 T G G GAAAAG GAT GAG C C AGC AGAGC T G GACAC CAAC C T G C C C GAC CCTGTGTTT GAGAG C 



1054 

Qy 301 IleLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGlnAlaArg 320 

I I II I I ::: I I I I I I I I I I I I I I I I I I I I I I II I I I III I I I I I I I I I I I I I I I 
Db 1055 ATCTTAAGTGGCTCCTTCCCTCTCATGTCCCATCCTGGCAGAGGAGGTTTCCAAGCTAGA 

1114 

Qy 321 LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 34 0 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I Ml 

Db 1115 CTCAGTCGCCTTCATGGAACGCCTTCAAAGAATATTAAAAAGTCTTCAGGGGCTCCACCT 

1174 

Qy 341 SerMetValProIleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 

I I I I I I ::: I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1175 TCAATGCTGCCTGCTCCTGGCCCTGGCAGTAACAAGAGGATGGTTCACTTCTCCCCAGAC 

1234 

Qy 361 SerHisHisHisAspHisTrpPheSerProGlyAlaArgThrGluHisAspGlnHisGln 380 

: : : I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1235 GCCCATCGCCATGACCACTGGTTTAGCCCTGGGGCTAGGACAGAGCACGGCCAGCATCAG 

1294 

Qy 381 LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerValArgThrAla 4 00 

I I I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I I I I I I I I ::: I I I I I I :::::: I I I I I I 
Db 12 95 CT T C T GAGAGAT AAC C GAGC GGAAAGAGGACACAAGAAAAG CT GC T C C CT GAAAAC AG C C 

1354 

Qy 4 01 SerArgGlnThrSerMetHisLeuGlySerLeuCysThrGlyAspIleLysArgArgArg 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II M 
Db 1355 AG CAG GCAGACAAG CAT G CAT T T AGGAT C CT T GT G CACAG GAGACAT CAAAAGGAGAAG G 

1414 

Qy 421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 440 

1 II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I II I I I I I I I I i I 
Db 1415 AAAGCTGCCCCGTTGCCTGGACCCACTGCAGCAGGAATTGTGGGTGAGAACGCCCAGCCG 

1474 

Qy 441 IleLeuGluAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 4 60 

I I I I I I I I I ::: I II I I I I I I I I I I I I I I I I I I I I I ::: I I I I I I I I I I I I I I I I I I I I I 
Db 1475 AT C C TAGAGAGCAAC AT C GGGAAC C GCAT G CT T CAGAGT AT G GGAT G GACAC CC G G GT C A 

1534 

Qy 4 61 GlyLeuGlyArgAspGlyLysGlylleSerGluProIleGlnAlaMetGlnArgProLys 4 80 

I I I I I I I I I I I I I I I I I I ::: I I I I I I ::: I I I I I I ::: I I I I I I ::: I I I I I I I I I I I I 
Db 1535 GGCCTCGGGCGAGATGGCAGAGGGATCGCGGAGCCAGTTCAAGCCGTTCAGAGGCCGAAA 

1594 

Qy 481 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I :: : : : : III : : : 

Db 15 95 GGGTTAGGACTTGGATTTCCTCTACCAAAAAGCTCCCCCACCAGCCCTGCCCCCACATCA 

1654 

Qy 501 GlyLysSerAla 504 

III III 
Db 1655 GGAAACCCTGCC 1666 
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RESULT 1 
AAH13916 

ID AAH13916 standard; cDNA; 2338 BP. 
XX 

AC AAH13916; 

XX 

DT 26-JUN-2001 (first entry) 
XX 
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DE Human cDNA sequence SEQ ID NO: 10937. 

XX 

KW Human; primer; detection; diagnosis; antisense therapy; gene therapy; ss. 
XX 

OS Homo sapiens. 
XX 

PN EP1074617-A2. 
XX 

PD 07-FEB-2001. 
XX 

PF 28-JUL-2000; 2000EP-00116126 . 

XX 

PR 29-JUL-1999; 99JP-00248036 . 

PR 27-AUG-1999; 99JP-0O3O0253 . 

PR ll-JAN-2000; 2000JP-00118776. 

PR 02-MAY-2000; 2000JP-00183767. 

PR 09-JUN-2000; 2000JP-00241899. 
XX 

PA (HELI-) HELIX RES INST. 
XX 

PI Ota T, Isogai T, Nishikawa T, Hayashi K, Saito K, Yamamoto 3 ; 

PI Ishii S, Sugiyama T, Wakamatsu A, Nagai K, otsuki T; 

XX 

DR WPI; 2001-318749/34. 
XX 

PT Primer sets for synthesizing polynucleotides, particularly the 5602 full- 

PT length cDNAs defined in the specification, and for the detection and/or 

PT diagnosis of the abnormality of the proteins encoded by the full-length 

PT CDNAS . 
XX 

PS Claim 8; SEQ ID NO 10937; 2537pp + Sequence Listing; English, 
xx 

cc The present invention describes primer sets for synthesising 5602 full- 

cc length cDNAs defined in the specification. Where a primer set comprises: 

cc (a) an oligo-dT primer and an oligonucleotide complementary to the 

CC complementary strand of a polynucleotide which comprises one of the 5602 

CC nucleotide sequences defined in the specification, where the 

CC oligonucleotide comprises at least 15 nucleotides; or (b) a combination 

cc of an oligonucleotide comprising a sequence complementary to the 

cc complementary strand of a polynucleotide which comprises a 5' -end 

cc sequence and an oligonucleotide comprising a sequence complementary to a 

CC polynucleotide which comprises a 3'-end sequence, where the 

cc oligonucleotide comprises at least 15 nucleotides and the combination of 

CC the 5' -end sequence/3 1 -end sequence is selected from those defined in the 

CC specification. The primer sets can be used in antisense therapy and in 

CC gene therapy. The primers are useful for synthesising polynucleotides, 

cc particularly full-length cDNAs. The primers are also useful for the 

cc detection and/or diagnosis of the abnormality of the proteins encoded by 

cc the full-length cDNAs . The primers allow obtaining of the full-length 

CC cDNAs easily without any specialised methods. AAH03166 to AAH13628 and 

CC AAH13633 to AAH18742 represent human cDNA sequences; AAB92446 to AAB95893 

cc represent human amino acid sequences; and AAH13629 to AAH13632 represent 

cc oligonucleotides, all of which are used in the exemplification of the 

cc present invention 

XX 

SQ Sequence 2338 BP; 739 A; 478 C; 541 G; 580 T; 0 U; 0 Other; 
Alignment scores: 

Pred. No.: 2.07e-199 Length: 2338 

Score: 2694.00 Matches: 504 

Percent similarity: 100.00% Conservative: 0 

Best Local similarity: 100.00% Mismatches: 0 

Query Match: 100.00% indels: 0 
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DB: 4 Gaps: 0 

US-09-771-312-2 (1-504) x AAH13916 (1-2338) 

MetGl uGl uLeuVal Hi sAspLeuVal SerAl aLeuGl uGl uSerSerGl uGl nAl aArq 20 
I I I I I I I I I I I I I I I i I I I I I II I I I I I I I I I I I I I I I ! I I I I I I I I I I I I I I i I I I I ll 
ATGGAGGAGCTGGTTCATGACCTTGTCTCAGCATTGGAAGAGAGCTCAGAGCAAGCTCGA 230 

GlyGlyPheAl aGl uThrGlyAspHi sSerArgSerll eSerCysProLeuLysArgGl n 40 

1 1 1 1 1 1 1 1 1 1 1 1 ; 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 IIM 1 1 M 1 1 1 ll II 1 1 II 1 1 i 1 1 11 1 1 1 

GGTGGATTTGCTGAAACAGGAGACCATTCTCGAAGTATATCTTGCCCTCTGAAACGCCAG 290 

41 Al aArgLysArgArgGlyArgLysArgArgSerTyrAsnValHisHi sProTrpGl uThr 60 
I I I I 111 I I I 11: 11 1 I I I 11 1 I I I 111 111 I I I I I I I I li I I I I I M I I I I I I ^ 

GCAAGGAAAAGGAGAGGGAGAAAACGGAGGTCGTATAATGTGCATCACCCGTGGGAGACT 350 

GlyHisCysLeuSerGluGlySerAspSerSerLeuGl uGl uProSerLysAspTyrArg 80 
I II I II I I I I I I I M II II II II I I I I I II I I II I I I II I I I I I I I I I I M I I I I I II M 

GGTC ACTGCTTAAGTGAAGGCTCTG ATTCTAGTTTAG AAGAACCAAGCAAGG ACTATAGA 4 10 

Gl uAsnHi sAsnAsnAsnLysLysAspHi sSerAspSerAspAspGl nMetLeuVal Al a 100 
I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I I I I I I I I I I ' I I I I I I I I I 
GAGAATCACAATAATAATAAAAAAGATCACAGTGACTCTGATGACCAAATGTTAGTAGCA 470 

LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 120 
MIMlMll M IM III III'MI lllllll lllllllllll III 111:11 II Mill I 

AAGCGCAGGCCGTCATCAAACTTAAATAATAATGTTCGAGGGAAAAGACCTCTATGGCAT 530 

GluserAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArqArgArgLysVal 140 
I I I I I I I I I I I I I I I I I M I I ! I I I I I I I I I I I I 11 1 I I I I I M l I 11 1 11 1 II M 

GAGTCTGATTTTGCTGTGGACAATGTTGGGAATAGAACTCTGCGCAGGAGGAGAAAGGTA 5 90 

LysArqMetAlaValAspLeuProGl nAsplleSerAsnLysArgThrMetThrGlnPro 160 
M II Mil M ! I II I II I I I I M I I II II I I I I II II I I I II I 111 M I I I I II II I II I 

AAACGCATGGCAGTAGATCTCCCACAGGACATCTCTAACAAACGGACAATGACCCAGCCA 650 

ProGl uGl yCysArgAspGl nAspMetAspSerAspArgAl aTyrGl nTy rGl nGl uPhe 180 
I I I I I I II M I I Ml I I I M I I I I II I I I IMI M I 111 I I II I I II I I I I I I I 1 I II I 

CCTGAGGGTTGTAGAGATCAGGACATGGACAGTGATAGAGCCTACCAGTATCAAGAATTT 7 10 

ThrLysAsnLysVal LysLysArgLysLeuLysll ell eArgGl nGl yProLysIl eGl n 200 

I I I I III I I I I II II I I M II Mil I I I II II II I III 111! II I I I II II I III I I I I 

ACC AAG AAC AAAGTC AAAAAAAGAAAGTTGAAAATAATC AG AC AAGGACCAAAAATCCAA 770 

AspGl uGlyVal Val LeuGl uSerGl uGl uThrAsnGl nThrAsnLysAspLysMetGl u 220 

II I I II I I I M I I II II I I I I II I I I I I I I I Ml I I I II I I I I I II II I M I I II I I II I 

G ATGAAGG AGTAGTTTTAGAAAGTG AGGAAACGAACC AGACCAATAAGGACAAAATGG AA 830 

CysGl uGl uGl nLysVal SerAspGl uLeuMetSerGl uSerAspSerSerSerLeuSer 240 

I I I I I II I I I M I 1 1 I I I II M I I I I M I II II I II I II I I M I I I M I 1 1 II 1 1 1 1 II 

TGTGAAGAGCAAAAAGTCTCAGATGAGCTCATGAGTGAAAGTGATTCCAGCAGTCTCAGC 890 

SerThrAspAlaGlyLeuPheThrAsnAspGl uGlyArgGl nGlyAspAspGl uGl nSer 260 
M II II I I M I I II II I I I M I I M I I I M I I M! I I 111 II I I I I II II I I I I II I I M 

AGCACTGATGCTGGATTGTTTACCAATGATGAGGGAAGACAAGGTGATGATGAACAGAGT 950 

AspTrpPheTyrGluLysGluSerGlyGlyAlaCysGlylleThrGlyValVal ProTrp 280 

II I I II I I M I I II I I I II I I I II I I I M I I I I II I II I I II I III I I II II M II I I I 

GACTGGTTCTACGAAAAGGAATCAGGTGGAGC ATGTGGTATC ACTGG AGTTGTGCCCTGG 10 10 

TrpGl uLysGl uAspProThrGl uLeuAspLysAsnVal ProAspProVal PheGl user 300 
I I I I I I I I M I I I I I I I I I I I I II II I I I I M I I II I I I I II I II I I I II I I M II I I I 

TGGGAAAAGG AAGATCCTACTGAGCTAGACAAAAATGTACCAGATCCTGTCTTTG AAAGT 10 70 
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Qy 


1 


Db 


171 


Qy 


21 


Db 


231 


Qy 


41 


Db 


291 


Qy 


61 


Db 


351 


Qy 


81 


Db 


411 


Qy 


101 


Db 


471 


Qy 


121 


Db 


531 


Qy 


141 


Db 


591 


Qy 


161 


Db 


651 


Qy 


181 


Db 


711 


Qy 


201 


Db 


771 


Qy 


221 


Db 


831 


Qy 


241 


Db 


891 


Qy 


261 


Db 


951 


Qy 


281 


Db 


1011 



us-09-771-312-2.rng 



Qy 

Db 

QY 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



301 IleLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGl nAlaArg 320 
_ I i M I I I M I I I I M I I I I I I I M I I I I I I M I I ; I I I I I 111 III I II I I I I II I I I II 

107 1 ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 1130 

321 LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 340 
M I I I I Mil 1 1 1 1 1 1 M I 1 1 I I 1 1 I I I 1 1 I I 1 1 M I I I 1 1 M M I I II I 1 1 1 II 1 1 1 1 I 

1131 CTCAGTCGCCTTCATGGAATGTCTTCAAAGAATATTAAAAAATCTGGAGGGACTCCAACT 1190 

341 SerMetVal ProIleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 
M Ml I Mill II IIIMIMI III Mill MMMI MIMI llll II llll M III II 

1191 TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 12 50 

361 SerHi sHi sHi sAspHi sTrpPheSerProGl yAl aArgThrGl uHi sAspGl nHi sGl n 380 
I M I I I I I I II M I 1 1 M II II M I M II I I II M 1 1 111 I I M II I II I 1 1 1 1 1 I II I I 

12 5 1 TCTCATCACCATGACCATTGGTTTAGCCCTGGGGCTAGGACAGAGCATGACCAGCATCAG 13 10 



381 



400 



LeuLeuArgAspAsnArgAl aGl uArgGlyHisLysLysAsnCysSerValArgThrAla 

II II I I Mill M IIMlllll Ml Mill MIMI MM II MM llllllfl III M 

1311 CTTCTGAGAGATAATCGAGCTGAAAGAGGACACAAGAAAAATTGTTCTGTGAGAACAGCC 1370 

401 SerArgGl nThrSerMetHisLeuGlySerLeucysThrGlyAspll eLysArgArgArg 420 

III I llll I II M MIMIMI III Mill MM II III I I I MM MM Mil 111 II 

1371 AGCAGGCAAACAAGCATGCATTTAGGATCCTTATGCACGGGAGATATCAAACGGAGAAGA 1430 

42 1 LysAl aAl aProLeuP roGl yProThrThrAl aGl yPheVal Gl yGl uAsnAl aGl nPro 440 
II M I I II III II III llllllll llll II II MIMI III III MIMI MM MM 

1431 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 1490 

441 ileLeuGl uAsnAsnll eGlyAsnArgMetLeuGl nAsnMetGlyTrpThrProGlySer 460 
M M I I II Ml II III llllllll llll II II llll II III III II III Mill III I I 

1491 ATCCTAGAAAATAATATTGGAAACCGAATGCTTCAGAATATGGGCTGGACGCCTGGGTCA 1550 

461 Gl yLeuGl yArgAspGl yLysGl yll eSerGl uProll eGl nAl aMetGl nArgProLys 480 
M II I I II M II II I I I I I II I I II I I I I I I II I II I I II I I I I M I II II I Ml I II M 

1551 GGCCTTGGACGAGATGGCAAGGGGATCTCTGAGCCAATTCAAGCCATGCAGAGGCCAAAG 1610 

481 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 500 
M II I M I I I I I II I II I I I II I II I I I II II I II I I I II I I I II I I II II III I I II M 

1611 GGATTAGGACTTGGATTTCCTCTACCAAAAAGTACTTCCGCAACTACTACCCCCAATGCA 1670 

501 GlyLysSerAla 504 

MIIIIIIIMI 
1671 GGAAAATCCGCC 1682 



RESULT 2 
ADR99112 

ID ADR99112 standard; DNA; 2338 BP. 
XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 
XX 
OS 
XX 
PN 
XX 
PD 
XX 



ADR99112; 

02-DEC-2004 (first entry) 

Hypothetical protein FLJ10252, coding sequence, SEQ ID 118. 
Cytostatic; breast cancer; cancer; human; gene; ds; FLJ10252. 
Homo sapiens. 
WO2004078035-A2. 
16-SEP-2004. 
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PF 27-FEB-2004; 2004WO-US007268 . 
XX 

PR 28-FEB-2003; 2003US-0450655P. 

XX 

PA (FARB ) BAYER PHARM CORP. 
XX 

PI Eveleigh D, Bigwood D; 

XX 

DR WPI; 2004-653556/63. 

DR P-PSDB; ADR99239. 

DR REFSEQ; NM_018040.1. 

XX 

PT Diagnosing breast cancer comprises comparing the level of expression of 

PT genes or gene products in a first biological sample taken from a patient 

PT with that in a normal patient sample. 
XX 

PS Claim 2; SEQ ID NO 118; 53pp; English, 
xx 

CC The present invention relates to a method (Ml) for diagnosing breast 

cc cancer in a patient. The method comprises comparing the level of 

CC expression of one or more genes or gene products in a biological sample 

cc from the patient with that in a normal patient sample, where a difference 

cc in the gene expression in the first sample compared to that in the second 

cc sample is a diagnostic of the disease. Also claimed are: method (M2) for 

CC distinguishing between normal and disease tissues; method (M3) for 

cc monitoring the response of a breast cancer patient to treatment with an 

CC anti -cancer agent; method (M4) for identifying a compound for treating 

CC breast cancer; and an array for distinguishing between normal and disease 

CC tissues comprising two or more probes corresponding to genes selected 

CC from ADR98995-ADR99121 or comprising two or more polypeptides selected 

CC from ADR99122-ADR99248. In Ml and M2 the genes are selected from ADR98995 

CC -ADR99121 and the gene products are polypeptides selected from ADR99122- 

CC ADR99248. Ml is useful for diagnosing breast cancer. M2 and the array are 

CC useful for distinguishing between normal and disease tissue. M3 is useful 

CC for monitoring the response of a breast cancer patient to treatment with 

CC an anti-cancer agent. M4 is useful for identifying a compound for 

cc treating breast cancer. Note: The sequence data for this patent did not 

cc form part of the printed specification, but was obtained in electronic 

cc format directly from WIPO at ftp.wipo.int/pub/published_pct_sequences. 
xx 

SQ Sequence 2338 BP; 739 A; 478 C; 541 G; 580 T; 0 U; 0 Other; 

Alignment Scores: 
Pred. No.: 
Score: 

Percent similarity: 
Best Local similarity: 
Query Match: 
DB: 

US-09-771-312-2 (1-504) x ADR99112 (1-2338) 

Qy 1 MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAlaArg 20 

U J M M M M I I I I I 1 1 I I 1 1 1 1 I I 1 1 1 1 1 1 I I I I 1 1 1 1 I I 1 1 I I I I I I 1 1 1 1 I I ll 

Db 171 ATGGAGGAGCTGGTTCATGACCTTGTCTCAGCATTGGAAGAGAGCTCAGAGCAAGCTCGA 230 

Qy 21 GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 40 

I M I I I I I I I I I I I I I M I I I 1 1 1 1 I I 1 1 I 111 I I I I 1 1 1 1 I I 1 1 II II I I 1 1 I 111 1 1 

Db 231 GGTGGATTTGCTGAAACAGGAGACCATTCTCGAAGTATATCTTGCCCTCTGAAACGCCAG 290 

Qy 41 AlaArgLysArqArgGlyArgLysArgArgSerTyrAsnValHisHisProTrpGluThr 60 

I I II I M I 11 1 111 I I Mil I I I 11 1 111 I I I I I I I I I I I I M I I I I I I I I M 

Db 291 GCAAGGAAAAGGAGAGGGAGAAAACGGAGGTCGTATAATGTGCATCACCCGTGGGAGACT 350 



2.07e-199 


Length: 


2338 


2694.00 


Matches : 


504 


100 . 00% 


conservative: 


0 


100 . 00% 


Mismatches: 


0 


100 . 00% 


Indels : 


0 


13 


Gaps : 


0 
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Qy 

DD 


61 

"3 C 1 


GlyHi'sCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArg 
! 1 1 1 1 1 1 1 1 1 i 1 f 1 1 1 1 1 E 1 1 1 1 1 1 1 ! 1 1 1 1 1 i 1 1 i 1 1 1 1 1 1 1 1 1 1 1 f 1 T 

GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 


80 
410 


Qy 

DD 


81 

/111 

411 


Gl uAsnHi sAsnAsnAsnLysLysAspHi sSerAspSerAspAspGl nMetLeuVal Al a 
1 1 I 1 1 1 1 I 1 1 1 M I 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I ; I I | | I || I 
GAGAATCACAATAATAATAAAAAAGATCACAGTGACTCTGATGACCAAATGTTAGTAGCA 


100 
470 


Qy 

DD 


101 

A 71 

4/1 


LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 
1 1 1 1 111 111 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 III 1 1 ; 1 1 1 1 1 

AAGCGCAGGCCGTCATCAAACTTAAATAATAATGTTCGAGGGAAAAGACCTCTATGGCAT 


120 
530 


r\\/ 

Qy 

DD 


LL 1 

5il 


GluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 
1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 1 1 1 111 1 1 1 M 1 111 111 11 111 ^ 

GAGTCTGATTTTGCTGTGGACAATGTTGGGAATAGAACTCTGCGCAGGAGGAGAAAGGTA 


1 A A 

140 
590 


Qy 

Db 


1 A1 
141 

591 


LysArgMetAlaValAspLeuProGl nAspll eSerAsnLysArgThrMetThrGl nPro 
1 1 1 1 111 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 111 1 1 1 1 1 1 II 1 1 1 1 1 1 

AAACGCATGGCAGTAGATCTCCCACAGGACATCTCTAACAAACGGACAATGACCCAGCCA 


lbO 

650 


Qy 


161 


ProGl uGl ycysArgAspGl nAspMetAspSerAspArgAl aTyrGl nTyrGl nGl uPhe 
II 1 1 1 II 1 1 1 M 1 llll II II II II 1 II 1 1 II II 1 1 II 1 1 II II II II II 1 II II 1 1 II 
CCTGAGGGTTGTAGAGATCAGGACATGGACAGTGATAGAGCCTACCAGTATCAAGAATTT 


180 


r-x l_ 

DD 


bbl 


710 


Qy 


1 Ol 

lol 


ThrLysAsnLysVal LysLysArgLysLeuLysIl ell eArgGl nGl yProLysll eGl n 
1 1 1 1 1 II 1 1 M 1 1 1 1 1 1 1 1 1 M 111 , 1 1 1 1 1 1 1 1 1 1 1 1 1 1 111 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 

ACCAAGAACAAAGTCAAAAAAAGAAAGTTGAAAATAATCAGACAAGGACCAAAAATCCAA 


"5 C\C\ 


Db 


711 


770 


Qy 

Db 


Z01 

771 


AspGl uGl yVal Val LeuGl uSerGl uGl uThrAsnGl nThrAsnLysAspLysMetGl u 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 

GATGAAGGAGTAGTTTTAGAAAGTGAGGAAACGAACCAGACCAATAAGGACAAAATGGAA 


830 


Qy 


ZZ 1 


cysGl uGl uGl nLysVal SerAspGl uLeuMetSerGl uSerAspSerSerSerLeuSer 
1 1 1 1 Ml M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 

TGTGAAGAGCAAAAAGTCTCAGATGAGCTCATGAGTGAAAGTGATTCCAGCAGTCTCAGC 


*3 A Ci 


Db 


831 


o n a 

890 


Qy 


~) A 1 

Z41 


SerThrAspAlaGlyLeuPheThrAsnAspGl uGlyArgGlnGlyAspAspGluGlnSer 
II M 1 II 1 II M 1 Mil 1 1 II II 1 1 1 1 Ml II II 1 1 ! Ill 1 II 1 1 1 1 II : II II 1 1 1 1 II 

AGCACTGATGCTGGATTGTTTACCAATGATGAGGGAAGACAAGGTGATGATGAACAGAGT 


~> art 


Db 


891 


950 


Qy 


261 


AspTrpPheTyrGluLysGluSerGlyGlyAlaCysGlyll eThrGlyValval ProTrp 
1 1 1 II Ml 1 1 1 1 1 II II II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 M 1 1 1 II 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 II 

GACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTTGTGCCCTGG 


280 


Db 


951 


1010 


Qy 


T O 1 

281 


TrpGl uLysGl uAspProThrGl uLeuAspLysAsnVal ProAspProVal PheGl user 
II 1 1 1 1 1 II 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 M II II 1 II 1 II 1 II 1 1 II 1 II 1 1 1 II 

TGGGAAAAGGAAGATCCTACTGAGCTAGACAAAAATGTACCAGATCCTGTCTTTGAAAGT 


300 


Db 


1011 


1070 


Qy 

Db 


301 
1071 


II eLeuThrGl ySerPheProLeuMetSerHi sProSerArgArgGlyPheGl nAl aArg 
II M 1 1 M 1 1 1 M 1 1 1 1 1 1 II 1 1 1 M 1 1 1 1 II II 1 M 1 1 111 111 1 1 II II 1 II III M 

ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 


320 
1130 


Qy 


321 


LeuSerArgLeuHisGlyMetSerSerLysAsnll eLysLysSerGlyGlyThrProThr 
1 1 II II 1 111 II 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 II 1 II II II 1 1 1 II 1 1 1 1 1 1 1 MM 1 1 1 1 II 1 

CTCAGTCGCCTTCATGGAATGTCTTCAAAGAATATTAAAAAATCTGGAGGGACTCCAACT 


340 


Db 


1131 


1190 


Qy 

Db 


341 
1191 


SerMetVal Proll eProGlyProValGlyAsnLysArgMetVal HisPheSerProAsp 
1 M 1 1 1 1 1 1 1 1 M 1 1 M M 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 M III 1 1 M 1 II M M 1 1 1 1 1 M 1 1 

TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 


360 
1250 


Qy 


361 


SerHi sHi sHi sAspHi sTrpPheSerProGl yAl aArgThrGl uHi sAspGl nHi sGl n 
1 1 1 1 II 1 1 1 1 1 1 M 1 1 II II II 1 II 1 1 1 II 1 1 1 1 1 1 1 Ml II 1 II M 1 1 1 II 1 II II 1 1 1 


380 
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Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 



us-09-771-312-2.rng 
1251 TCTCATC ACCATGACCATTGGTTTAGCCCTGGGGCTAGGACAGAGC ATGACCAGCATCAG 1 3 10 

381 LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsncysSerVal ArgThrAla 400 
I I I I I I I I I I I I I I I I lit I I I I I Mil I I I I I I I I I I I I II I I I I I I I I I I llll I ^ 

1311 CTTCTGAGAGATAATCGAGCTGAAAGAGGACACAAGAAAAATTGTTCTGTGAGAACAGCC 1370 

401 SerArgGl nThrSerMetHi sLeuGlySerLeuCysThrGlyAsplleLysArgArqArq 420 
M 1 1 III I I II I I I 1 1 1 1 1 MM II I M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I M 1 1 1 1 i I I llllll II 

1371 AGCAGGCAAACAAGCATGCATTTAGGATCCTTATGCACGGGAGATATCAAACGGAGAAGA 1430 

421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 440 
I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I M I II II I I I I 

1431 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 1490 

441 IleLeuGl uAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 460 
I I I 1 1 I I I M I I 1 1 I M II I 1 1 1 1 I 111 II II I M I M I 1 1 I I I II I 1 1 1 1 I II Ml I II 

1491 ATCCTAGAAAATAATATTGGAAACCGAATGCTTCAGAATATGGGCTGGACGCCTGGGTCA 1550 

461 Gl yLeuGlyArgAspGlyLysGlyll eSerGl uProIl eGl nAl aMetGl nArgProLys 480 
I M 1 1 I I I I I 111 II I M I I M 1 1 1 1 1 1 1 1 1 1 I I I I I I I 1 1 1 1 1 1 1 1 I I 1 1 I 111 I I 1 1 I 

15 51 GGCCTTGGACGAGATGGCAAGGGGATCTCTGAGCCAATTCAAGCCATGCAGAGGCCAAAG 1610 

481 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 500 
I II II I I M III I I II II I lllllll Mill I II I I III II II I I II II I I I III II I I 

1611 GGATTAGGACTTGGATTTCCTCTACCAAAAAGTACTTCCGCAACTACTACCCCCAATGCA 1670 

501 GlyLysSerAla 504 

I I I I I I I I I I I I 
1671 GGAAAATCCGCC 1682 



RESULT 3 
ABN59701 

ID ABN59701 standard; cDNA; 2344 BP. 
XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 
KW 
KW 
KW 
XX 
OS 
XX 
PN 
XX 
PD 
XX 
PF 
XX 
PR 
XX 
PA 
XX 
PI 
PI 
XX 
DR 
DR 
XX 



ABN59701; 

28-JUN-2002 (first entry) 
Novel human coding sequence SEQ ID NO: 112. 

Human ; anti anaemi c ; vul ne rary ; anti i nf 1 ammato ry ; i mmunomodul ato r ; 
antiinferti li ty; cerebroprotective; cytostatic; rheumatic; gene therapy; 
neuroprotective; antiparkinsonian; protein therapy; EST; 
expressed sequence tag; gene; ss. 

Homo sapiens. 

WO200222660-A2. 

21-MAR-2002. 

10- SEP-2001; 2001WO-US026015. 

11- SEP-2000; 2000US-00659671. 
(HYSE-) HYSEQ INC. 



Tang YT, Liu C, Zhou P, Asundi V, Zhang 3 , Zhao QA, 
Xue AJ, Yang Y, Wehrman T, Drmanac RT; 

WPI; 2002-292408/33. 
P-PSDB; ABB97288. 



Ren F; 
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2.08e-199 


Length: 


2344 


2694.00 


Matches : 


504 


100.00% 


Conservative: 


0 


100.00% 


Mismatches : 


0 


100.00% 


Indels : 


0 


6 


Gaps : 


0 



us-09-771-312-2.rng 

PT An isolated polynucleotide for treating diseases associated with its 

PT encoded polypeptide such as cancer and multiple sclerosis. 
XX 

PS Claim 1; SEQ ID NO 112; 509pp; English, 
xx 

CC The present invention provides the protein and coding sequences of 444 

cc novel human proteins. These were isolated from expressed sequences tags 

CC (ESTs). They can be used to stimulate cell growth, to regulate 

CC haematopoiesis e.g. to treat aplastic anaemia, to help tissue regrowth 

CC e.g. in burn treatment, to regulate the immune system e.g. to treat 

CC multiple sclerosis, to regulate activin or inhibin e.g. to treat 

CC infertility, to regulate naemostasis or thrombolysis e.g. to treat stroke 

CC and cancer, to screen for drugs, to treat inflammatory conditions e.g. 

cc rheumatoid arthritis, and to treat nervous system disorders e.g. 

CC Parkinson's disease. The present sequence is a coding sequence of the 

cc invention 

xx 

SQ Sequence 2344 BP; 747 A; 476 C; 541 G ; 580 T; 0 U; 0 Other; 

Alignment Scores: 
Pred. No. : 
score: 

Percent similarity: 
Best Local similarity: 
Query Match: 
DB: 

US-09-771-312-2 (1-504) x ABN59701 (1-2344) 

Qy 1 MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAlaArg 20 

II MM Mill II III MINI I III MINIMI MM MM III Mill MINI III 

Db 170 ATGGAGGAGCTGGTTCATGACCTTGTCTCAGCATTGGAAGAGAGCTCAGAGCAAGCTCGA 229 

Qy 21 GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 40 

I I II IM II II III I I II I Mill III I Mil I IIIIMM II I II I M MM Mil I 

Db 230 GGTGGATTTGCTGAAACAGGAGACCATTCTCGAAGTATATCTTGCCCTCTGAAACGCCAG 289 

Qy 41 AlaArgLysArgArgGlyArgLysArgArgSerTyrAsnValHisHisProTrpGluThr 60 

J I I 1 T I I I I 1 1 1 I T I I J I I T [ I I I I T I I T I I I I !: I I I 1 I I i 1 I 1 I I I I I ! I I I i I I I I 

Db 290 GCAAGGAAAAGGAGAGGGAGAAAACGGAGGTCGTATAATGTGCATCACCCGTGGGAGACT 349 

Qy 61 GlyHisCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArg 80 

II II II I I M M I I II I II I II I I I I II I I II I I I M I I I I I I I I I II I I I II I I I II IT 

Db 350 GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 409 

Qy 81 GluAsnHisAsnAsnAsnLysLysAspHisSerAspSerAspAspGlnMetLeuValAla 100 

I II I M I I I II I M M I I I I II I II I I II I II M I I 1 1 M II I II M I M II I I 1 1 I I 

Db 410 GAGAATCACAATAATAATAAAAAAGATCACAGTGACTCTGATGACCAAATGTTAGTAGCA 469 

Qy 101 LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 120 

M I I llMlll M I M I I I II M II I I I I I II I M Mil M I 1 1 I llM I I 1 1 II M I 

Db 470 AAGCGCAGGCCGTCATCAAACTTAAATAATAATGTTCGAGGGAAAAGACCTCTATGGCAT 529 

Qy 121 GluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 140 

I I II I I II I I I II I II I II M I II I I II I I M I M I I I I I 111 llMlll I M I II I 

Db 530 GAGTCTGATTTTGCTGTGGACAATGTTGGGAATAGAACTCTGCGCAGGAGGAGAAAGGTA 589 

Qy 141 LysArgMetAlaValAspLeuProGlnAsplleSerAsnLysArgThrMetThrGlnPro 160 

M I III III II I II I I II III Mill llllll III Ml MM I 111 II I MM I I MM I 

Db 590 AAACGCATGGCAGTAGATCTCCCACAGGACATCTCTAACAAACGGACAATGACCCAGCCA 649 

Qy 161 ProGluGlyCysArgAspGlnAspMetAspSerAspArgAlaTyrGlnTyrGlnGluPhe 180 

I I M M I I 1 1 I 1 1 111 I II I II I I I I I I II I I I M I Mil I I I I II I I I 1 1 I I II 1 1 I M 
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Db 


650 


Qy 


181 


Db 


710 


Qy 


201 


Db 


770 


Qy 


221 


Db 


830 


Qy 


241 


Db 


890 


Qy 


261 


Db 


950 


Qy 


281 


Db 


1010 


Qy 


301 


Db 


1070 


Qy 


321 


Db 


1130 


Qy 


341 


Db 


1190 


Qy 


361 


Db 


1250 


Qy 


381 


Db 


1310 


Qy 


401 


Db 


1370 


Qy 


421 


Db 


1430 


Qy 


441 


Db 


1490 


Qy 


461 


Db 


1550 


Qy 


481 



us-09-771-312-2.rng 



1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i i T 1 1 1 1 1 ! i [ 1 1 1 1 1 1 1 1 1 T 1 1 1 i 1 1 1 J 1 1 1 1 1 1 i [ j i 

ACCAAGAACAAAGTCAAAAAAAGAAAGTTGAAAATAATCAGACAAGGACCAAAAATCCAA 

AspGl uGl yval Val LeuGl userd uGl uThrAsnGl nThrAsnLysAspLysMetGl u 
MM MIIMIIIIII lllllllllllllllll I IIIIMIIIII I Mill MINIMI 

GATGAAGGAGTAGTTTTAGAAAGTGAGGAAACGAACCAGACCAATAAGGACAAAATGGAA 

CysGl uGluGlnLysValSerAspGluLeuMetSerGluSerAspSerSerSerLeuSer 
i I I I I i i I 1 I 1 I I I I I t I I I I I t I I I I I I I I I I I I t I I 1 I I I I I i I I I I I I 1 I 1 1 I I ] I I 
TGTGAAGAGCAAAAAGTCTCAGATGAGCTCATGAGTGAAAGTGATTCCAGCAGTCTCAGC 

serThrAspAlaGlyLeuPheThrAsnAspGluGlyArgGl nGlyAspAspGluGlnSer 
1 1 1 M 1 1 1 M M II 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 M M 1 1 II 1 1 1 1 1 I 1 1 1 1 1 M 1 1 1 M 1 1 

AGCACTGATGCTGGATTGTTTACCAATGATGAGGGAAGACAAGGTGATGATGAACAGAGT 



I i 1 1 I 1 1 1 I I I I 1 1 1 1 I I I I 1 1 I 1 1 1 1 1 I 1 1 I i I I ! I I I I 1 I 1 1 I I I 1 1 I I i 1 1 1 I I I I I 

GACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTTGTGCCCTGG 

TrpGl uLysGl uAspProThrGl uLeuAspLysAsnVal ProAspProVal PheGl user 
1 1 1 It 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II I M 1 1 1 1 1 1 1 1 1 1 1 II 1 1 M 1 1 1 1 1 M II 1 1 1 1 1 

TGGGAAAAGGAAGATCCTACTGAGCTAGACAAAAATGTACCAGATCCTGTCTTTGAAAGT 

II eLeuThrGl ySerPheProLeuMetserHi sProSerArgArgGl yPheGl nAl aArg 
IMMIIII MIMM MIMMMMMIIM MIIIIMllll MIMI MUM Ml 

ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 

LeuserArgLeuHisGlyMetserSerLysAsnlleLysLysSerGlyGlyThrProThr 

1 1 1 1 1 1 1 1 T 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 j 1 1 1 1 e 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CTCAGTCGCCTTCATGGAATGTCTTCAAAGAATATTAAAAAATCTGGAGGGACTCCAACT 

SerMetVal ProlleProGlyProValGlyAsnLysArgMetVal HisPheSerProAsp 
1 1 1 1 t 1 1 1 1 1 1 1 ! 1 1 1 1 1 E 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 T I E 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 

serHi sHi sHi sAspHi sTrpPheSer ProGl yAl aArgThrGl uHi sAspGl nHi sGl n 
I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I T I I I I I I I I I I I I I I I I I I I I I 

TCTCATCACCATGACCATTGGTTTAGCCCTGGGGCTAGGACAGAGCATGACCAGCATCAG 

LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerVal ArgThrAla 
IMM Mil MMIM MIIMIIIMM IMMMIMI IIIM MINI I llllllll 

CTTCTGAGAGATAATCGAGCTGAAAGAGGACACAAGAAAAATTGTTCTGTGAGAACAGCC 

SerArgGl nThrSerMetHisLeuGlySerLeucysThrGlyAspll eLysArgArgArg 
MINIMI MMIM MMMMMMMMMIMM! Mill MINI MINIMI 

AGCAGGCAAACAAGCATGCATTTAGGATCCTTATGCACGGGAGATATCAAACGGAGAAGA 



200 
769 



829 



889 



949 



1009 

300 

1069 

320 

1129 

340 

1189 

360 

1249 

380 

1309 

400 

1369 

420 

1429 



LysAl aAl aProLeuProGlyProThrThrAl aGl yPheVal GlyGl uAsnAl aGl nPro 440 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 1489 



I I I I I I I I I I I I I I I I I I I I I I I 1 I I T I I I I I I I I I I I I I I I I M I I I i I I I I I I I I I I I 
ATCCTAGAAAATAATATTGGAAACCGAATGCTTCAGAATATGGGCTGGACGCCTGGGTCA 

GlyLeuGlyArgAspGlyLysGlylleSerGl uProlleGl nAl aMetGlnArgProLys 

1 1 1 1 1 1 1 1 i 1 1 T 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 j 1 1 T 1 1 1 1 1 1 

GGCCTTGGACGAGATGGCAAGGGGATCTCTGAGCCAATTCAAGCCATGCAGAGGCCAAAG 



460 
1549 
480 
1609 
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us-09-771-312-2.rng 
M I I M 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 ! 1 1 1 1 1 1 1 I 1 1 1 

Db 1610 GGATTAGGACTTGGATTTCCTCTACCAAAAAGTACTTCCGCAACTACTACCCCCAATGCA 1669 

Qy 501 GlyLysSerAla 504 

Illlllllllll 

Db 1670 GGAAAATCCGCC 1681 

RESULT 4 
AAS11663 

ID AAS11663 standard; cDNA; 2345 BP. 
XX 

AC AAS11663; 
XX 

DT 24-OCT-2001 (first entry) 
XX 

DE Prostate and testi s-rel ated gene 84P2A9 cDNA. 
XX 

KW 84P2A9; PCR primer; DNA adaptor; prostate; testis; tissue; cancer; ss; 

KW leukaemia; tumour; kidney; brain; bone; skin; ovary; breast; pancreas; 

KW colon; lung; cytostatic; gene therapy; antibody therapy; ribozyme; 

KW single chain monoclonal antibody; serum; blood; urine. 
XX 

OS Homo sapiens. 
XX 

PN WO200155391-A2. 
XX 

PD 02-AUG-2001. 
XX 

PF 26-DAN-2001; 2001WO-US002651. 
XX 

PR 26-JAN-2000; 2000US-0178560P. 
XX 

PA (UROG-) UROGENESYS INC. 
XX 

PI Jakobovits A, Afar DEH, challita-Eid PM, Levin E, Mitchell SC; 

PI Hubert RS; 

XX 

DR WPI; 2001-502631/55. 

DR P-PSDB; AAU06524. 
XX 

PT New 84P2A9 gene and its encoded protein, useful for diagnosing and 

PT treating cancer, e.g. leukemia and cancer of the prostate, testis, 

PT kidney, brain or bone, or for eliciting an immune response, 
xx 

PS claim 1; Fig 2; 149pp; English. 
XX 

CC The nucleic acid sequences represent the 84P2A9 gene and the primers and 

cc adaptors used to amplify 84P2A9 DNA. 84P2A9 exhibits prostate and testis 

cc specific expression in normal adult tissue, but it is also aberrantly 

cc expressed in many cancers including leukaemia and tumours of the 

cc prostate, testis, kidney, brain, bone, skin, ovary, breast, pancreas, 

CC colon and lung. The 84P2A9 polynucleotide, its related protein and also 

cc peptide fragments of the protein are therefore useful for diagnosing and 

cc treating cancer. A vector comprising a polynucleotide which encodes a 

cc single chain monoclonal antibody, that immunospecifically binds to an 

CC 84P2A9-related protein, and a ribozyme capable of cleaving a 

CC polynucleotide having the 84P2A9 coding sequence, are both useful in the 

CC preparation of a composition for treating a patient with a cancer that 

cc expresses 84P2A9. The sequences can be used in diagnostic methods to 

CC monitor the level of 84P2A9 gene products in serum, blood, urine and 

CC tissue and to thereby detect the presence of cancerous cells 
XX 

SQ Sequence 2345 BP; 750 A; 476 C; 542 G; 577 T; 0 U; 0 Other; 
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Alignment Scores: 
Pred. No.: 
Score: 

Percent Similarity: 
Best Local Similarity: 
Query Match: 
DB: 



2.08e-199 


Length: 


2345 


2694.00 


Matches: 


504 


100.00% 


Conservative: 


0 


100.00% 


Mismatches : 


0 


100.00% 


Indels: 


0 


5 


Gaps: 


0 



US-09- 


-771-312 


Qy 


1 


Db 


163 


Qy 


21 


Db 


223 


Qy 


41 


Db 


283 


Qy 


61 


Db 


343 


Qy 


81 


Db 


403 


Qy 


101 


Db 


463 


Qy 


121 


Db 


523 


Qy 


141 


Db 


583 


Qy 


161 


Db 


643 


Qy 


181 


Db 


703 


Qy 


201 


Db 


763 


Qy 


221 


Db 


823 


Qy 


241 


Db 


883 



-2 (1-504) x AAS11663 (1-2345) 

MetGl uGl uLeuVal Hi sAspLeuVal SerAl aLeuGl uGl uSerSerGl uGl nAl aArq 20 
I I I I I I I I I I M I I II I I I II I I I I I I I I II I I I I I I I i I | | | | : | | | | | | I I I I 111 
ATGGAGGAGCTGGTTCATGACCTTGTCTCAGCATTGGAAGAGAGCTCAGAGCAAGCTCGA 222 

GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 40 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 111 I I I I I I M I I I I M I I I I I M llll I 

GGTGGATTTGCTGAAACAGG AGACC ATTCTCGAAGTATATCTTGCCCTCTGAAACGCCAG 282 

Al aArgLysArgArgGl yArgLysArgArgSerTy rAsnVal Hi sHi sProTrpGl uThr 60 
I I 1 1 1 1 I I I Mil 111 I I 1 1 I II 1 1 111 llM I I I 1 1 1 1 I I I 1 1 1 1 J I 1 1 I II M M 

GCAAGGAAAAGGAGAGGGAGAAAACGGAGGTCGTATAATGTGCATCACCCGTGGGAGACT 342 

GlyHiscysLeuSerGluGlySerAspserSerLeuGluGl uProSerLysAspTyrArg 80 

I I I I I I I I I II I I I I I I I I I : II I I I I I I I I I I I I I I I I I I I M I I , I I I I I II I I I 111 

GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 40 2 

Gl uAsnHi sAsnAsnAsnLysLysAspHi sSerAspSerAspAspGl nMetLeuVal Al a 100 

II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I II I I I I II I 
GAGAATCACAATAATAATAAAAAAGATCACAGTGACTCTGATGACCAAATGTTAGTAGCA 462 

LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 120 
M M 11 1 I I II 1 1 I I M I 1 1 I 1 1 M I I I I I 1 1 I I I 1 1 ll M M I 1 1 'II 1 1 I 1 1 1 1 I M I 

AAGCGCAGGCCGTCATCAAACTTAAATAATAATGTTCGAGGGAAAAGACCTCTATGGCAT 522 

GluSerAspPheAl aval AspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 140 
I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I 11 1 I I I I I I 111 Jl 11 1 11 1 I I M 

GAGTCTGATTTTGCTGTGG ACAATGTTGGG AATAGAACTCTGCGCAGGAGGAGAAAGGTA 582 

LysArgMetAl aval AspLeuProGl nAsplleSerAsnLysArgThrMetThrGl nPro 160 
I I I I 111 I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I 111 ; I I I I I I I I I I I I I 
AAACGCATGGCAGTAGATCTCCCACAGGACATCTCTAACAAACGGACAATGACCCAGCCA 642 

ProGl uGl yCysArgAspGl nAspMetAspSerAspArgAl aTy rGl nTyrGl nGl uPhe 180 

I M I M I I I M I 111 I I I I M I I I I I I I I I I I I I I I 111 I I I I I I : I I I I I I I I I I I I I 

CCTGAGGGTTGTAGAGATCAGGACATGGACAGTGATAGAGCCTACCAGTATCAAGAATTT 702 

ThrLysAsnLysVal LysLysArgLysLeuLysIl ell eArgGl nGl yProLysll eGl n 200 

I I 1 1 I II I : II I I I I I I 1 1 I 1 1 111 1 1 : 1 1 1 1 1 1 I I M 1 1 111 I 1 1 1 1 1 1 1 1 1 1 1 1 1 II I 

ACCAAGAACAAAGTCAAAAAAAGAAAGTTGAAAATAATCAGACAAGGACCAAAAATCCAA 762 

AspGl uGl yval Val LeuGl uSerGl uGl uThrAsnGl nThrAsnLysAspLysMetGl u 220 
I I I I I I I I I II I I I I I I I I I I II I I I I i I I I I I I : II I I I I I I I I I I I I I I I I I I I M I I 
GATGAAGGAGTAGTTTTAGAAAGTGAGGAAACGAACCAGACCAATAAGGACAAAATGGAA 822 

cysGl uGl uGl nLysVal SerAspGl uLeuMetSerGl uSerAspSerSerSerLeuSer 240 

I I II I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
TGTGAAGAGCAAAAAGTCTCAGATGAGCTCATGAGTGAAAGTGATTCCAGCAGTCTCAGC 882 

SerThrAspAl aGlyLeuPheThrAsnAspGl uGlyArgGlnGlyAspAspGluGlnSer 260 

1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 ii 1 1 1 1 1 1 m 1 1 ; 1 1 ill 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ii I 

AGCACTGATGCTGGATTGTTTACCAATGATGAGGGAAGACAAGGTGATGATGAACAGAGT 942 
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Qy 


261 


Db 


943 


Qy 


281 


Db 


1003 


Qy 


301 


Db 


1063 


Qy 


321 


Db 


1123 


Qy 


341 


Db 


1183 


Qy 


361 


Db 


1243 


Qy 


381 


Db 


1303 


Qy 


401 


Db 


1363 


Qy 


421 


Db 


1423 


Qy 


441 


Db 


1483 


Qy 


461 


Db 


1543 


Qy 


481 


Db 


1603 


Qy 


501 


Db 


1663 



us-09-771-312-2.rng 
AspTrpPheTyrGl uLysGluSerGlyGlyAlaCysGlylleThrGlyValVal ProTrp 
1 1 I 1 1 I 1 1 I I I I 1 1 1 1 1 I M 1 1 1 1 1 I I II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 I I I I 1 1 I 1 1 1 1 1 1 1 1 I 

GACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTTGTGCCCTGG 

TrpGl uLysGl uAspProThrGl uLeuAspLysAsnVal ProAspProVal PheGl user 
I I I : I I II I I I I M I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I ( I II I I I I I I I I I I 

TGGGAAAAGGAAGATCCTACTGAGCTAGACAAAAATGTACCAGATCCTGTCTTTGAAAGT 



280 
1002 
300 
1062 



301 II eLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGl nAl aArq 320 
I I h I 1 1 M 1 1 1 1 I M I 1 1 I 1 1 i I I 1 1 1 1 1 1 1 ! I II I 1 1 i 111 111 I 1 1 1 1 1 1 1 1 1 1 1 .1 

ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 112 2 

LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 340 
I I I I I M 111 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

CTCAGTCGCCTTCATGGAATGTCTTCAAAGAATATTAAAAAATCTGGAGGGACTCCAACT 1182 

SerMetVal ProlleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 
I I I I I I I I I M II II I M III Ml I I IMIMIM Mill II IM I II III II I I I I I 

TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 1242 



II I I I I I I I I I II I I I I I I I I I I I I I II I I I I ■ I I I 111 II I I I I I I I I I I I I I I I I I 
TCTCATCACCATGACCATTGGTTTAGCCCTGGGGCTAGGACAGAGCATGACCAGCATCAG 

LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerValArgThrAla 
II I I I I Mil I II I I I 111 I I I I I I 111 I I I I M I I M I II I I I I I I I I I I 111 I I I I I 

CTTCTGAGAGATAATCGAGCTGAAAGAGGACACAAGAAAAATTGTTCTGTGAGAACAGCC 



380 
1302 
400 
1362 



SerArgGl nThrSerMetHisLeuGlySerLeuCysThrGlyAsplleLysArgArgArg 420 
II 1 1 I I M I 1 1 II M 1 1 I M I I 1 1 I I II 1 1 1 I ! 1 1 I II Ml 1 1 I M M I 1 1 111 111 11 

AGCAGGCAAACAAGCATGCATTTAGGATCCTTATGCACGGGAGATATCAAACGGAGAAGA 1422 



1 1 I I I I I I I I I I I I I I I II I II I II II I I I I I M IM 1 1 I II II I I I I I I II I II II I I 

AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 

II eLeuGl uAsnAsnll eGlyAsnArgMetLeuGl nAsnMetGlyTrpThrProGlySer 
I I II I M II I I II III I I I II III llllllllllll MM I II I II II II II I MM I I 

ATCCTAGAAAATAATATTGGAAACCGAATGCTTCAGAATATGGGCTGGACGCCTGGGTCA 



440 
1482 



I III IM I I Ml I I II I II II I lllll MIMIIII MM I II I I I II I I III II III I 

GGCCTTGGACGAGATGGCAAGGGGATCTCTGAGCCAATTCAAGCCATGCAGAGGCCAAAG 



1 1 I I I I I I I II II I I M I I I I I I I I I I II M I II I M I I I 1 1 I I M II I II M I I I I I II 

GGATTAGGACTTGGATTTCCTCTACCAAAAAGTACTTCCGCAACTACTACCCCCAATGCA 



1542 



1602 



1662 



M M I M I I I I I 

GGAAAATCCGCC 



504 
1674 



RESULT 5 
ACN91982 

ID ACN91982 standard; DNA; 2583 BP. 



XX 
AC 
XX 
DT 
XX 
DE 
XX 
KW 



ACN91982; 

02-DEC-2004 (first entry) 

Breast cancer related marker, seq id 13132. 

Cancer; breast; tumour; cytostatic; marker; detection; therapy; ds. 
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XX 

OS Homo sapiens. 
XX 

PN US2003099974-A1. 
XX 

PD 29-MAY-2003. 
XX 

PF 18-JUL-2002; 2002U5-00198846. 
XX 

PR 18-JUL-2001; 2001US-0306220P. 
XX 

PA (MILL-) MILLENNIUM PHARM INC. 
XX 

PI Li Hie J, Xu Y, Wang Y, Steinmann K; 
XX 

DR WPI; 2003-787014/74. 
XX 

PT Novel isolated polypeptide associated with breast cancer, useful for 

PT detecting presence of polypeptide in sample, as a marker for breast 

PT cancer. 
XX 

PS Disclosure; SEQ ID NO 13132; 36pp; English. 
XX 

cc The invention relates to an isolated polypeptide (I) associated with 

cc breast cancer which is encoded by a nucleic acid molecule comprising a 

cc nucleotide sequence (Si) . Further disclosed is an antibody that binds to 

cc the polypeptide of the invention. The activity of the polypeptide of the 

CC invention may be described as cytostatic. The antibody is useful for 

CC detecting the presence of (I) in a sample. Nucleic acid molecules of the 

cc invention are useful in the detection of breast tumours. (I) is useful as 

cc a marker for breast cancer and in breast cancer therapy. Sequences given 

cc in records ACN78851-ACN92934 represent nucleic acid markers associated 

cc with breast cancer. Note: The sequence listing does not form part of the 

cc specification but may be obtained in electronic format from the USPTO web 

CC site at seqdata. uspto.gov/sequence. html?DocID=20030099974 
XX 

SQ Sequence 2583 BP; 813 A; 519 C; 575 G; 664 T; 0 U; 12 Other; 

Alignment Scores: 
Pred. No.: 
Score: 

Percent Similarity: 
Best Local Similarity: 
Query Match: 
DB: 

US-09-771-312-2 (1-504) x ACN91982 (1-2583) 

Qy 1 MetGluGluLeuValHisAspLeuValserAlaLeuGluGluSerSerGluGlnAlaArg 20 

i II I I I I I I . I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I II I I I I I I I i I i I II 

Db 199 ATGGAGGAGCTGGTTCATGACCTTGTCTCAGCATTGGAAGAGAGCTCAGAGCAAGCTCGA 258 

Qy 21 GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 40 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 1 I I I I I I I I I I I I I I I I I I I I 11 1 I I 
Db 259 GGTGGATTTGCTGAAACAGGAGACCATTCTCGAAGTATATCTTGCCCTCTGAAACGCCAG 318 

Qy 41 AlaArgLysArgArgGlyArgLysArgArgSerTyrAsnValHisHisProTrpGluThr 60 

I I I I1M I Mil llM I I 111 I M 111 111 I I M M M M M 1 1 I I II I I I I I I 1 1 I I I 

Db 319 GCAAGGAAAAGGAGAGGGAGAAAACGGAGGTCGTATAATGTGCATCACCCGTGGGAGACT 378 

Qy 61 GlyHisCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArq 80 

I I I I II I I I . I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I II I I I I I I I I I II 

Db 379 GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 438 



2.35e-199 


Length: 


2583 


2694.00 


Matches: 


504 


100.00% 


Conservative: 


0 


100.00% 


Mismatches : 


0 


100.00% 


Indels: 


0 


11 


Gaps : 


0 
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Qy 


81 


Gl uAsnHisAsnAsnAsnLysLysAspHisserAspSerAspAspGlnMetLeuValAla 
II 1 1 1 1 : 1 1 M 1 1 1 1 1 I 1 1 1 1 1 1 1 II II 1 1 1 1 1! 1 ! 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 
GAGAATCACAATAATAATAAAAAAGATCACAGTGACTCTGATGACCAAATGTTAGTAGCA 


100 


Db 


439 


498 


Qy 

Db 


101 

499 


LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 

IIIIITIITIIIIIIIMIIIIIIIIIIIIIIIIIIIITIIIIIIIITIIIIIIIIIIM 
AAGCGCAGGCCGTCATCAAACTTAAATAATAATGTTCGAGGGAAAAGACCTCTATGGCAT 


120 
558 


Qy 

Db 


121 

559 


Gl uSerAspPheAl aval AspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 II 1 1 111 M M 1 1 111 111 111 III 1 M 

GAGTCTGA 1 1 1 1 GCTGTGGACAATGTTGGGAATAGAACTCTGCGCAGGAGGAGAAAGGTA 


140 
618 


Qy 


141 


LysArgMetAl aValAspLeuProGl nAsplleSerAsnLysArgThrMetThrGlnPro 
1 1 !■ Ill II 1 1 III 1 II IIIM II II II MM II 1 Ml II II 1 lllllll II II MM 11 

AAACGCATGGCAGTAGATCTCCCACAGGACATCTCTAACAAACGGACAATGACCCAGCCA 


160 


Db 


619 


678 


Qy 


161 


ProGl uGl yCysArgAspGl nAspMetAspSerAspArgAl aTyrGl nTyrGl nGl uPhe 
M M 1 1 II 1 1 II Mil II 1 1 1 II 1 II 1 II 1 II 1 1 1 1 1 MM 1 1 1 II 1 1 II 1 1 1 II 1 II 1 1 

CCTGAGGGTTGTAGAGATCAGGACATGGACAGTGATAGAGCCTACCAGTATCAAGAATTT 


180 


Db 


679 


7 So 


Qy 


181 


ThrLysAsnLysVal LysLysArgLysLeuLysIl ell eArgGl nGlyProLysll eGl n 
1 M 1 1 M 1 M II M 1 II 1 M 1 Mil M 1 M 1 1 M 1 II M 111 1 1 II 1 1 1 1 1 1 1 1 1 1 1 M 

ACCAAGAACAAAGTCAAAAAAAGAAAGTTGAAAATAATCAGACAAGGACCAAAAATCCAA 


200 


Db 


739 


798 


Qy 


201 


AspGl uGlyVal Val LeuGl uSerGl uGl uThrAsnGl nThrAsnLysAspLysMetGl u 
1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 M 1 1 II 1 1 1 1 II 1 II 1 1 1 1 1 1 II 1 1 1 M 

GATGAAGGAGTAGTTTTAGAAAGTGAGGAAACGAACCAGACCAATAAGGACAAAATGGAA 


220 


Db 


799 


o c o 
obo 


Qy 

Db 


221 
859 


CysGl uGl uGl nLysVal SerAspGl uLeuMetSerGl uSerAspSerSerSerLeuSer 
1 1 1 1 1 1 11 1 II 1 1 1 1 1 1 1 II 1 II II 11 1 1 1 II 1 1 1 11 1 1 1 1 1 1 1 1 1 M 1 1 1 1 II 1 M 

TGTGAAGAGCAAAAAGTCTCAGATGAGCTCATGAGTGAAAGTGATTCCAGCAGTCTCAGC 


240 

r\ i o 

91o 


Qy 


241 


SerThrAspAlaGlyLeuPheThrAsnAspGl uGlyArgGlnGlyAspAspGluGlnSer 
M 1 1 1 1 1 1 II 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Ml 1 II II 111 1 II 1 II II II 1 1 1 II M 1 1 1 

AGCACTGATGCTGGATTGTTTACCAATGATGAGGGAAGACAAGGTGATGATGAACAGAGT 


260 


Db 


919 


y /o 


Qy 


261 


AspTrpPheTyrGl uLysGl uSerGlyGlyAl aCysGlylleThrGlyValval ProTrp 
II 1 1 1 1 II 1 II II 1 II 1 II 1 1 1 1 1 II 1 1 1 II 1 1 1 II II 1 II 1 1 1 1 1 1 1 1 1 1 II M 1 1 1 

GACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTTGTGCCCTGG 


280 


Db 


979 


1038 


Qy 


281 


TrpGl uLysGl uAspProThrGl uLeuAspLysAsnVal ProAspProVal PheGl user 
M 1 1 II 1 II 1 1 III 1 II III II II 1 Mill II Ml 1 1 1 1 II 1 III Mill II II MM 1 

TGGGAAAAGGAAGATCCTACTGAGCTAGACAAAAATGTACCAGATCCTGTCTTTGAAAGT 


300 


Db 


1039 


1098 


Qy 

Db 


301 
1099 


IleLeuThrGlySerPheProLeuMetSerHis Prose rArgArgGlyPheGl nAlaArg 
1 II 1 II M 1 1 M 1 1 1 1 MM IIIIMM M M IMMI II 111 111 III 1 Mill Mill 

ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 


320 
1158 


Qy 


321 


LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 
M 11 II Mil 1 1 II 1 II 1 1 II 1 1 1 1 1 M 1 1 II II II 1 M 1 1 M II 1 1 II II M M 1 II 

CTCAGTCGCCTTCATGGAATGTCTTCAAAGAATATTAAAAAATCTGGAGGGACTCCAACT 


340 


Db 


1159 


1218 


Qy 

Db 


341 
1219 


SerMetVal Proll eProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 
1 II 1 II II 1 1 II 1 1 II MM MIMMIMM 1 Mill 111 II 1 II II II II II 1 Mill 

TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 


360 
1278 


Qy 

Db 


361 
1279 


SerHi sHi sHi sAspHi sTrpPheSerProGl yAl aArgThrGl uHi sAspGl nHi sGl n 
1 II III 1 II 1 1 III II III 1 1 Mllllll II 11 1 1 Mil 1 III II IMMI MM 1 III 

TCTCATCACCATGACCATTGGTTTAGCCCTGGGGCTAGGACAGAGCATGACCAGCATCAG 


380 
1338 


Qy 


381 


LeuLeuArgAspAsnArgAl aGl uArgGl yHi sLysLysAsnCysSerVal ArgThrAl a 
1 II II 1 Mil II II M Ml M 1 IM 1 M 1 1 1 1 1 II M 1 1 M 1 M 1 II II 1 M 111 II II 


400 
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Db 1339 CTTCTGAGAGATAATCGAGCTGAAAGAGGACACAAGAAAAATTGTTCTGTGAGAACAGCC 1398 

Qy 401 SerArgGlnThrSerMetHisLeuGlySerLeuCysThrGlyAsplleLysArgArgArq 420 

1 1 1 1 111 I I M I 1 1 1 1 1 1 I I 1 1 1 1 1 1 1 1 1 1 1 1 M II 1 1 M 1 1 I M I ! I 1 1 I Mil 111 II 

Db 1399 AGCAGGCAAACAAGCATGCATTTAGGATCCTTATGCACGGGAGATATCAAACGGAGAAGA 1458 

Qy 421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 440 

I I I I I I I I I I I I I I I I I I I I I I i I I II I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I 
Db 1459 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 1518 

Qy 441 ileLeuGluAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 460 

I I II I I I I I I I I I I I I I I I I I I I I I 111 II I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 1519 ATCCTAGAAAATAATATTGGAAACCGAATGCTTCAGAATATGGGCTGGACGCCTGGGTCA 1578 

Qy 461 GlyLeuGlyArgAspGlyLysGlylleSerGluProlleGlnAlaMetGlnArgProLys 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I M I I I I I I I I I I I I I I I I I I I 

Db 1579 GGCCTTGGACGAGATGGCAAGGGGATCTCTGAGCCAATTCAAGCCATGCAGAGGCCAAAG 1638 

Qy 481 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 500 

I 1 1 1 1 1 1 1 1 I I 1 1 1 1 I I I I M 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II I I 1 1 1 1 1 1 I 1 1 1 1 1 

Db 1639 GGATTAGGACTTGGATTTCCTCTACCAAAAAGTACTTCCGCAACTACTACCCCCAATGCA 1698 

Qy 501 GlyLysSerAla 504 

llllllllllll 
Db 1699 GGAAAATCCGCC 1710 

RESULT 6 
AAS72189 

ID AAS72189 standard; cDNA; 1563 BP. 
XX 

AC AAS72189; 
XX 

DT 13-FEB-2002 (first entry) 
XX 

DE DNA encoding novel human diagnostic protein #7993. 
xx 

KW Human; chromosome mapping; gene mapping; gene therapy; forensic; 

KW food supplement; medical imaging; diagnostic; genetic disorder; ss. 
XX 

OS Homo sapiens. 
XX 

PN WO200175067-A2. 
XX 

PD ll-OCT-2001. 
XX 

PF 30-MAR-2001; 2001WO-US008631. 
XX 

PR 31-MAR-2000; 2000US-00540217. 

PR 23-AUG-2000; 2000US-00649167. 
XX 

PA (HYSE-) HYSEQ INC. 
XX 

PI Drmanac RT, Liu c, Tang YT; 
XX 

DR WPI; 2001-639362/73. 

DR P-PSDB; ABG08002. 
XX 

PT New isolated polynucleotide and encoded polypeptides, useful in 

PT diagnostics, forensics, gene mapping, identification of mutations 

PT responsible for genetic disorders or other traits and to assess 

PT biodiversity. 
XX 

PS claim 1; SEQ ID NO 7993; 103pp; English. 
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XX 

CC The invention relates to isolated polynucleotide (I) and polypeptide (II) 

CC sequences. (I) is useful as hybridisation probes, polymerase chain 

CC reaction (PCR) primers, oligomers, and for chromosome and gene mapping, 

CC and in recombinant production of (II). The polynucleotides are also used 

CC in diagnostics as expressed sequence tags for identifying expressed 

CC genes. (I) is useful in gene therapy techniques to restore normal 

CC activity of (II) or to treat disease states involving (II). (II) is 

CC useful for generating antibodies against it, detecting or quantitating a 

cc polypeptide in tissue, as molecular weight markers and as a food 

CC supplement. (II) and its binding partners are useful in medical imaging 

cc of sites expressing (II). (I) and (II) are useful for treating disorders 

cc involving aberrant protein expression or biological activity. The 

cc polypeptide and polynucleotide sequences have applications in 

CC diagnostics, forensics, gene mapping, identification of mutations 

CC responsible for genetic disorders or other traits to assess biodiversity 

CC and to produce other types of data and products dependent on DNA and 

CC amino acid sequences. AAS64197-AAS94564 represent novel human diagnostic 

cc coding sequences of the invention. Note: Trie sequence data for this 

cc patent did not appear in the printed specification, but was obtained in 

cc electronic format directly from WIPO at 

CC ftp . wi po . i nt/pub/publ i shecLpct_sequences 

XX 

SQ Sequence 1563 BP; 486 A; 326 C; 395 G; 356 T; 0 U; 0 Other; 
Alignment Scores: 

Pred. No.: 1.55e-106 Length: 1563 

Score: 1495.50 Matches: 312 

Percent Similarity: 69.80% Conservative: 7 

Best Local Similarity: 68.27% Mismatches: 21 

Query Match: 55.51% Indels: 117 

DB: 5 Gaps: 2 

US-09-771-312-2 (1-504) x AAS72189 (1-1563) 

Qy 1 MetGluGluLeuValHisAspLeuValSerAlaLeuGlu Gl uSerSerGl uGl nAl a 19 

I I I I I I I I I I M I I I I I I II I I I I I I I I I I I I I II I I I I Ml ::: 

Db 155 ATGGAGGAGCTGGTTCATGACCTTGTCTCAGCATTGGAAAGAGAGCTCCAGAGCAAGCCT 214 

Qy 20 ArgGlyGlyPheAlaGluThrGlyAsp-HisSer-ArgSerlleSerCysProLeuLysA 39 

i in 1 1 1 1 1 1 1 1 1 1 1 1 1 1 Mini mini mm 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 215 CGAGGTGGATTTGCTGAACCAGGAGACCCATTCTCCGAAGTATATCTTGCCCTCTGAAAC 274 

Qy 39 rqGlnAlaArgLysArgArqGlyArqLysArgArg-SerTyrAsnValHisHisProTrp 58 

IT IIIIITIIIIITIITIIIIITIIIIITIIT 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 275 GCCCAGCAAGGAAAAGGAGAGGGAGAAAACGGAGGTTCGTATAATGTGCATCACCCGTGG 334 

Qy 59 Glu-ThrGlyHisCysLeu--SerGluGlySerAspSerSerLeuGluGluProSerLys 77 

III IIIIIIIM III I I I II II I II I II I I I I I I I 
Db 335 GAGGACTGGTCACTGGCTTAAAGTGAAGGCTCTGATTCTAGT 376 

Qy 78 AspTyrArgGluAsnHisAsnAsnAsnLysLysAspHisSerAspSerAspAspGlnMet 97 

Db 376 376 

Qy 98 LeuValAlaLysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgPro 117 

Db 376 376 

Qy 118 LeuTrpHisGluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArg 137 

Db 376 376 



Page 17 



Qy 


1 ~i o 

138 


us-09-771-312-2 . rng 
ArgLysVal LysArgMetAl aval AspLeuProGl nAspll eSerAsnLysArgThrMet 


157 


Db 


376 




376 




Qy 


158 


ThrGl nProProGl uGl yCysArgAspGl nAspMetAspSerAspArgAl aTyrGl nTyr 


177 


Db 


376 




376 




Qy 

Db 


178 

577 


Gl nGl uPheThrLysAsnLysVal LysLysArgLysLeuLysll ell eArgGl nGl yPro 
III ::: Ml 1 1 1 1 1 1 1 1 T 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 T 1 1 1 1 1 1 1 1 1 
TTTAGAAGAACAAAGTCAAAAAAAAGAAAGTTGAAAATAATCAGACAAGGACCA 


197 
430 


Qy 

Db 


198 
431 


Lysll eGl nAspGl uGl yVal Val LeuGl userGl uGl uThrAsnGl nThrAsnLysAsp 

1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 I 1 1 1 1 1 I 1 1 1 1 1 1 1 I 1 1 II 1 1 
AAAATCCAAGATGAAGGAGTAG 1 1 1 1 AGAAAGTGAGGAAACGAACCAGACCAATAAGGAC 


217 
490 


Qy 


218 


LysMetGl uCysGl uGl uGl nLysValSerAspGl uLeuMetSerGl uSerAspSerSer 
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 II 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

AAAATGGAATGTGAAGAGCAAAAAGTCTCAGATGAGCTCATGAGTGAAAGTGATTCCAGC 


237 


Db 


491 


550 


Qy 


238 


SerLeuSerSerThrAspAl aGlyLeuPheThrAsnAspGluGlyArgGlnGlyAspAsp 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 ; 1 1 1 1 1 1 1 111 1 1 1 1 1 1 1 1 1 1 1 

AGTCTCAGCAGCACTGATGCTGGATTGTTTACCAATGATGAGGGAAGACAAGGTGATGAT 


257 


DD 


CM 
J J 1 


r 1 a 

610 


Qy 


258 


Gl uGl nSerAspTrpPheTyrGl uLysGl uSerGlyGlyAlaCysGlylleThrGlyVal 
1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 Ml 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

GAACAGAGTGACTGGTTCTACGAAAAGGAATCAGGTGGAGCATGTGGTATCACTGGAGTT 


277 


Db 


611 


670 


Qy 


278 


Val ProTrpTrpGl uLysGl uAspProThrGl uLeuAspLysAsnVal ProAspProVal 
1 1 M II II 1 M 1 II II II 1! II II 1 1 1 1 Mill 1 1 II M 1 II 1 II II 1 1 II 1 1 1 1 1 II II 

GTGCCCTGGTGGGAAAAGGAAGATCCTACTGAGCTAGACAAAAATGTACCAGATCCTGTC 


297 


Db 


671 


730 


Qy 

Db 


298 
731 


PheGl use rlleLeuThrGlySerPheProLeuMetSerHis Prose rArgArgGly-Ph 
1 1 1 1 1 1 1 1 1 M 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11 1 1 1 1 111 111 1 1 II 

TTTGAAAGTATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTT 


317 
790 


Qy 

Db 


317 
791 


eGl nAlaArgLeuSerArg-LeuHi sGlyMetSerSerLysAsnll eLysLysserGlyG 
MM 1 1 T 1 1 1 1 1 1 1 1 T 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

CCAACTAAGACTCAGTCGGCCTTCATGGAATGTCTTCAAAGAATATTAAAAAATCTGGAG 


337 
850 


Qy 

DD 


337 

o r *i 

851 


lyThrProThrSerMetVal Proll eProGlyProValGlyAsnLysArgMetValHi sP 
1 1 1 1 1 1 1 1 1 M 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ! 1 1 1 1 M 1 1 1 111 1 1 1 1 1 1 1 1 

GGACTCCAACTTCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATT 


357 
910 


Qy 


357 


heSerProAspSerHi sHi sHi sAsphri sTrpPheSerProGlyAl aArgThrGl uHi sA 
II 1 1 1 1 1 1 II 1 1 II 1 1 1 1 1 1 1 1 1 1 M 1 II 1 M 1 1 1 1 1 II 1 1 1 1 II 1 1 Ml II 1 1 1 II II 1 

TTTCCCCGGATTCTCATCACCATGACCATTGGTTTAGCCCTGGGGCTAGGACAGAGCATG 


377 


Db 


911 


970 


Qy 


377 


spGl nHi sGl nLeuLeuArgAspAsnArgAl aGl uArgGlyHi sLysLysAsnCysSerV 
1 M M M M M 1 M 1 1 1 1 Ml 1 1 II M MM M II Mil II II 1 II 1 II 1 M II 1 M II 1 

ACCAGCATCAGCTTCTGAGAGATAATCGAGCTGAAAGAGGACACAAGAAAAATTGTTCTG 


397 


Db 


971 


1030 


Qy 

Db 


397 
1031 


al ArgThrAl aSerArgGl nThrSerMetHi sLeuGlySerLeuCysThrGl yAspll eL 
M 1 111 1 1 1 1 II 1 Mil 1 1 II 1 1 1 1 II 1 II II 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 II 1 II 

TGAGAACAGCCAGCAGGCAAACAAGCATGCATTTAGGATCCTTATGCACGGGAGATATCA 


417 
1090 


Qy 

Db 


417 
1091 


ysArgArgArgLysAl aAlaProLeuProGlyProThrThrAl aGlyPheValGlyGl uA 
1 1 111 111 .11 1 1 II II 1 1 1 II 1 II 1 1 III II 1 1 1 II M II 1 1 ::: 

AACGGAGAAGAAAAGCTGCACCTTTGCCTGGACCTACTACTGCAGATTATTTCTCCCCCA 


437 
1150 


Qy 


437 


snAlaGl nProIl eLeuGl uAsnAsnlleGlyAsn 448 

:::|||:::::: III::: 
TTCCCAAGCCAGTTATAGTAAAAGAATGTGGAAGT 1185 




Db 


1151 
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GenCore version 5.1.6 
Copyright (c) 1993 - 2005 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



Searched: 



December 4, 2005, 10:07:40 ; Search time 41 Seconds 

(without alignments) 
1182.762 Million cell updates/sec 

US-09-771-312-2 
2694 

1 MEELVHDLVSALEESSEQAR GFPLPKSTSATTTPNAGKSA 504 

BLOSUM62 

Gapop 10.0 , Gapext 0.5. 



283416 seqs, 96216763 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



283416 



Database 



PIR_80: * 
pirl : * 
pir2 : * 
pir3 : * 
pir4 : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 



Result 



Query 



No. 


Score 


Match 


Length 


DB 


ID 


1 


177. 5 


6 


6 


1105 


2 


T47582 


2 


167. 5 


6 


2 


767 


2 


S63182 


3 


156 


5 


8 


542 


2 


T46464 


4 


149 


5 


5 


695 


2 


T40168 


5 


148.5 


5 


5 


1403 


1 


A47328 


6 


146.5 


5 


4 


669 


2 


T28754 


7 


143 


5 


3 


1577 


2 


T19722 


8 


143 


5 


3 


3498 


2 


T22330 


9 


138.5 


5 


1 


368 


2 


G88636 


10 


135.5 


5 


0 


643 


2 


A96636 


11 


134 . 5 


5 


0 


699 


2 


138073 


12 


134.5 


5 


0 


896 


2 


D96556 


13 


133. 5 


5 


0 


1672 


2 


T46237 



Description 



hypothetical prote 
hypothetical prote 
hypothetical prote 
hypothetical prote 
natural killer eel 
hypothetical prote 
hypothetical prote 
hypothetical prote 
protein W09G12 . 7 [ 
unknown protein, 7 
nucleolar phosphop 
hypothetical prote 
hypothetical prote 



14 


133 


4 , 


, 9 


705 


2 


D88536 


ari di p nrntpi n — P 

d V— " J- ^4 ^ w lb/ X» ' J» 1 X 


15 


133 


4 , 


. 9 


705 


2 


S27786 


acidic Drotei n — C 


16 


133 


4 , 


. 9 


943 


2 


A42681 




17 


131. 5 


4 , 


. 9 


425 


2 


S55147 


hvDotheti cal nrnt*p 


18 


130 


4 , 


. 8 


608 


2 


T02299 


hvoothpt"! ral nrohp 

1 1 y Vh> _i_ d -i- ky l_ \w/ l~ 


19 


130 


4 , 


, 8 


679 


2 


S48437 


Vl \/T*i nt" H o "h i pa 1 n r o 


20 


129. 5 


4 , 


. 8 


2526 


2 


T20531 


h vd o 1~ hi p t" i ra 1 nrnt~p 


21 


129.5 


4 , 


. 8 


2722 


2 


T20532 


hvDotheti ral nrofp 


22 


129 . 5 


4 , 


. 8 


2738 


2 


E88320 


Drotein F07A1 1 fi T 


23 


128 . 5 


4 . 


. 8 


543 


2 


T27190 


h vnnthpf "i fal nrnfp 

ii y w i-jic- _i_ ^ ci _i_ j_ ^ v_. 


24 


128.5 


4 ( 


. 8 


552 


2 


T27 191 


Liy fcJ\J Hifcr L. X l-ci X pxvJLtr 


25 


128.5 


4 , 


. 8 


954 


2 


E86174 


nrnfpi n Fl 9P1 Q 


26 


127 . 5 


4 , 


, 7 


493 


2 


T02376 


H vn o^hiRt^i ra 1 nrnfp 

11 V V> Oil O _L. CI -1— kri/ X» \J L» >— 


91 


191 S 


4 


1 1 


S ^ 9 

yj o y 


o 


T1 Sfi 


Vt T 7y\ •H Vi Q "H 1 pal v j-^ 4- <-i 
Iiy^ULilcLlual [JlULc 


9 ft 


1 91 

X £*, 1 


4 


1 1 




o 


T08 92 9 


\ 7 V> y-> +- Vl d "H 1 Z$ 1 V> V* /~\ 4~ Q 


29 


127 


4 , 


, 7 


786 


2 


T33856 


H \/t> o t" Vi p 1~ "i 1 nrofp 

ii y Y~s t^iic . jl ci j_ l«/ jw v_ 


30 


127 


4 , 


, 7 


845 


2 


A45669 


neu rof i 1 ament t~ r i n 

1 1 v — - L4. JL w 1* -1- -1_ U. 1 L — H t w 1— -i- k_/ 


31 


127 


4 , 


, 7 


963 


2 


T04002 


hypothetical prote 


32 


126 . 5 


4 , 


, 7 


390 


2 


T34137 


hypo the ti ca.1 prote 


3 3 


X C* \J 


4 


7 


SQ ft 

23 O 


o 


R4 07 1 ? 


uyiiLin x iiu.iiia.ii 


34 


126 


4 , 


, 7 


1032 


2 


A57 514 


RNA helicase HEL11 


35 


125 


4 , 


. 6 
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4 , 
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2 
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h vr> nthpt i ra 1 nrnfp 


38 


124 


4 , 


. 6 


1166 


2 


H86341 


h vnrjt h r1~ i ra 1 nrnfp 


39 


123.5 


4 , 


, 6 


849 


2 


E86306 


Sim.ila.x~ to tufteli 


40 
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2 
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41 
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4, 


.5 
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2 
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hypothetical prote 


42 


122 


4, 


. 5 
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2 


T22456 


hypothetical prote 


43 
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4, 


.5 


971 


2 
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hypothetical prote 


44 
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4, 


.5 
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2 
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45 
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4, 


.5 
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2 
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ALIGNMENTS 



RESULT 1 
T47582 

hypothetical protein F24B22.190 - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 09-Jul-2004 
C;Accession: T47582 

R;Bloecker, H . ; Mewes, H.W.; Lemcke, K. ; Mayer, K.F.X.; Quetier, F. ; Salanoubat, 
M. 

submitted to the Protein Sequence Database, January 2000 

A; Reference number: Z23016 

A; Access ion: T47582 

A; Status: preliminary 

A;Molecule type: DNA 

A; Residues: 1-1105 <BL0> 

A; Cross-references : UNIPROT : Q9M3 83 ; UNI PARC : UPI 0000 0A4 10D; EMBL: AL132957 
A; Experimental source: cultivar Columbia; BAC clone F24B22 
C; Genetics : 
A;Map position: 3 

A;Introns: 35/3; 56/2; 294/3; 318/3; 349/3; 376/2; 426/3; 455/1; 485/3; 508/3; 
568/3; 633/1; 662/3; 681/3; 710/2; 981/1; 1043/3 



A;Note: F24B22. 190 



Query Match 6.6%; Score 177.5; DB 2; Length 1105; 

Best Local Similarity 23.0%; Pred. No. 0.00075; 

Matches 90; Conservative 48; Mismatches 146; Indels 107; Gaps 14; 

Qy 158 TQPPEGCRDQDMDSDRAYQYQEFTKNKVKKRKLKIIRQGPKIQDEGWLESEETNQT 214 

II I I I : : I : : I : I I : I : I : II : I 

Db 722 TQQYVPCPDQNNES-KVTENQPDSAKKEKSSQQKVIISAATTPNVEKVLSLPDAVQAAAA 780 

Qy 215 NKDKMECEEQKVSDELMSESDSSSLSSTDAGLFTNDEGRQGDDEQSDWFYEKESGGA 271 

: I I I I I : I : I I : I : I : 

Db 781 AAIASEKREKERVK EIKLASKTSLLAS KKKMSNV 814 

Qy 272 CGITGWPWWEKEDPTELDKNVPDPVFESILTGSFPLMSHPSRRGF 317 

: I : : I : : : I I : I : I : I I 

Db 815 LTMWKQRSHETQIQRPSPS LGDNP PTVSAEARS S FSTGQSMGKLKS DVI 863 

Qy 318 — QARLSRLHGMS SKNIKKSGGT PTSMVPIPG--PVG 350 

: I : I I : I I : MM : : I M I 

Db 864 I AKERST SNHGVSALTTAES S S S STTGGTLMGVMRGS FGGTLGGAS S SAS VQMP P I LPSA 923 

Qy 351 NKRMVHFSPDSHHHDHWFSPGARTEHDQHQL-LRDNRAERGHKKNCSVRTASR--QTSMH 407 

: I I I M I I I I II : I : : M 

Db 924 SPASVSVSGSGRRRFSETPTAGPTHREQPQTSYRDRAAERRNLYGSSTSSGNDVIDSSED 983 

Qy 408 LGSLCTGDI KRRRKAAPLPGPTTAGFVG ENAQ P I L ENN I GN RMLQNM 454 

II I : : I I I II : I I M M M M M I 

Db 984 LMGL RKGSSDPTPFPPGVGGRGITTSTEVSSFDVITEERAIDESNVGNRMLRNM 1037 

Qy 455 GWTPGSGLGRDGKGISEPIQAMQRPKGLGLG 485 

II I I M I M I I : M M I : III 

Db 1038 GWHEGS GLGKDGS GMKEPVQAQGVDRRAGLG 1068 



4 



GenCore version 5.1.6 
Copyright (c) 1993 - 2005 Compugen Ltd. 



OM protein - nucleic search, using f rame_plus_p2n model 

Run on: December 11, 2005, 17:38:44 ; Search time 5807 Seconds 

(without alignments ) 
4060.735 Million cell updates/sec 



Title: 

Perfect score; 
Sequence : 

Scoring table: 



Searched : 



US-09-771-312-2 
2694 

1 MEELVHDLVSALEESSEQAR GFPLPKSTSATTTPNAGKSA 504 

BLOSUM62 

Xgapop 10.0 , Xgapext 0 . 5. 
Ygapop 10.0 , Ygapext 0.5 
Fgapop 6.0 , Fgapext 7 . 0 
Delop 6.0 , Delext 7.0 

41078325 seqs, 23393541228 residues 



Total number of hits satisfying chosen parameters: 82156650 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 4 5 summaries 

Command line parameters: 
-MODEL=frame+_p2n. model -DEV=xlh 

Q=/cgn2_l/USPTO_spool/US09771312/runat_01122 005_145312__15071/app_query . fasta 
47 

-DB=EST -QFMT=f astap -SUFFIX=rst -MINMATCH=0 . 1 -LOOPCL=0 -LOOPEXT=0 
-UNITS=bits -START=1 -END=-1 -MATRIX=blosum62 -TRANS=human4 0 . cdi -LIST=45 
-DOCALIGN=200 -THR_SCORE=pct -THR_MAX=100 -THR_MIN=0 -ALIGN=15 -MODE^LOCAL 
-OUTFMT=pto -NORM=ext -HEAPSI ZE=5 0 0 -MINLEN=0 -MAXLEN=2 000 000000 
-USER-US 097 7 13 12_@CGN_l_l_5315_@runat_0 1122 005_145312_15 071 -NCPU=6 -ICPU=3 
-NO_MMAP -LARGEQUERY -NEG_SCORES=0 -WAIT -DSPBLOCK=100 -LONGLOG 
-DEV__TIMEOUT=12 0 -WARN_TIMEOUT=30 -THREADS=1 -XGAPOP=10 -XGAPEXT=0 . 5 -FGAPOP 
-FGAPEXT=7 -YGAPOP=10 -YGAPEXT=0.5 -DELOP=6 -DELEXT=7 



Database 



EST 


* 


1: 




gb estl:* 


2 




gb_est2: * 


3 




gb est3:* 


4 




gbjitc: * 


5 




gb est4:* 


6 




gb est5: + 


7 




gb est6:* 


8 




gb est7:* 


9 




gb gssl:* 


10: 


gb gss2:* 



11: gb_gss3:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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RESULT 1 
DQ052881 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



gene 



DQ052881 1587 bp DNA linear GSS 02-JUN-2005 

Homo sapiens FLJ10252 gene, VIRTUAL TRANSCRIPT, partial sequence, 
genomic survey sequence. 
DQ052881 

DQ052881. 1 GI : 66898828 
GSS. 

Homo s apiens { human ) 
Homo sapiens 

Eukaryota ; Metazoa ; Chorclata ; Craniata ; Vertebrata ; Euteleostomi ; 
Mammalia ; Eutheria ; Euarchontoglires ; Primates ; Catarrhini ; 
Hominidae; Homo. 

1 (bases 1 to 1587) 

Nielsen, R. , Bustamante, C . , Clark, A. G., Glanowski, S . , Sackton, T . B . , 

Hubisz,M. J. , Fledel-Alon, A. , Tanenbaum, D.M. , Civello, D. , 

White, T. J., Sninsky, J. J. , Adams, M.D. and Cargill,M. 

A Scan for Positively Selected Genes in the Genomes of Humans and 

Chimpanzees 

(er) PLoS Biol. 3 (6), E170 (2005) 
15869325 

2 (bases 1 to 1587) 

Nielsen, R. , Bustamante, C . , Clark,A.G., Glanowski, S . , Sackton, T . B . , 
Hubisz,M. J. , Fledel-Alon, A. , Tanenbaum, D .M. , Civello, D. , 
White, T. J., Sninsky, J. J. , Adams, M.D. and Cargill,M. 
Direct Submission 

Submitted ( 05-MAY-2 005 ) Celera Genomics, 45 West Gude Drive, 
Rockville, MD 20850, USA 

This sequence was made by sequencing genomic exons and ordering 
them based on alignment. Translation starts at the beginning of 
alignment . 

Location/Qualifiers 

1. .1587 

/organism="Homo sapiens" 
/mol_type="genomic DNA" 
/db_xref="taxon: 9606" 
/ chromosome="l" 
<1. .>1587 
/gene="FLJ10252" 
/locus tag= n HC13411" 



ORIGIN 



Alignment Scores: 

Pred. No. : 2. 81e-236 

Score: 2694.00 

Percent Similarity: 100.00% 
Best Local Similarity: 100.00% 

Query Match: 100. 00% 

DB: 11 



Length: 1587 

Matches: 504 

Conservative: 0 

Mismatches: 0 

Indels: 0 

Gaps: 0 



US-09-771-312-2 (1-504) x DQ052881 (1-1587) 

Qy 1 MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAlaArg 2 0 

M I I ! I I I I I I 1 I I I I I I M I I II I I I I M I I I I I I I I I M I I I I I I I I II I I I I I I I I I 



Db 73 ATGGAGGAGCTGGTTCATGACCTTGTCTCAGCATTGGAAGAGAGCTCAGAGCAAGCTCGA 132 

Qy 21 GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 40 

I I I I I I I ! I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 133 GGTGGATTTGCTGAAACAGGAGACCATTCTCGAAGTATATCTTGCCCTCTGAAACGCCAG 192 

Qy 41 AlaArgLysArgArgGlyArgLysArgArgSerTyrAsnValHisHisProTrpGluThr 60 

II I I I I I I I I I I I I I I I I I I i I II I I I I I I I I I I I I I I I I I I I I I I I II I I I M I I I I I I 

Db 193 G C AAG GAAAAG GAG AG G GAGAAAAC G GAG GT C GT AT AAT GT G CAT C AC C C GT G G G AGAC T 252 

Qy 61 GlyHisCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArg 80 

II I I I I I I I I I I I I M I I II I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 253 GGTCACTGCTTAAGTGAAGGCTCTGATTCTAGTTTAGAAGAACCAAGCAAGGACTATAGA 312 

Qy 81 GluAsnHisAsnAsnAsnLysLysAspHisSerAspSerAspAspGlnMetLeuValAla 100 

II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 313 GAGAAT CACAAT AATAATAAAAAAGAT CACAGT GACT CT GAT GAC CAAAT GTT AGTAGCA 372 

Qy 101 LysArgArgProSerSerAsnLenAsnAsnAsnValArgGlyLysArgProLeuTrpHis 12 0 

I I II I I II I I 1 I I I I I I I I I I I I I I I I I I I M I I I I I I I I I M I I I I I I I I I I I I I I I I I 
Db 373 AAG C G C AGG C C GT CAT CAAACT T AAAT AAT AAT GT T C GAGG GAAAAGAC C T CTAT G GCAT 432 

Qy 121 GluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 14 0 

I I I I I I I I I I I I I I I I I I I I 1 I I I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 433 GAGT C T GAT TTTGCTGTG GACAAT GT T G G GAAT AGAACT C T G C G C AGGAGGAGAAAGGT A 492 

Qy 141 LysArgMetAlaValAspLeuProGlnAspIleSerAsnLysArgThrMetThrGlnPro 160 

I I I I I I II I II I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I 
Db 4 93 AAAC G CAT G G C AGT AGAT CT C C CACAG GACAT CT C T AAC AAAC GGACAAT GAC C CAGC CA 552 

Qy 161 ProGluGlyCysArgAspGlnAspMetAspSerAspArgAlaTyrGlnTyrGlnGluPhe 180 

I 1 I I I I I I I I I I I I I I I I i I I I I II I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 
Db 553 C C T GAG G GT T GT AGAGAT CAGGACAT G GAC AGT GAT AGAG CC T AC CAGT AT CAAGAAT T T 612 

Qy 181 ThrLysAsnLysValLysLysArgLysLeuLysIlelleArgGlnGlyProLysIleGln 200 

I I I I I 1 I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 613 AC CAAGAACAAAGT CAAAAAAAGAAAGT T GAAAATAAT C AGACAAG GAC CAAAAAT C CAA 672 

Qy 201 AspGluGlyValValLeuGluSerGluGluThrAsnGlnThrAsnLysAspLysMetGlu 220 

I I I I M I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I II I I I II I 
Db 673 GAT GAAG GAGT AGT T T T AGAAAGT GAG GAAAC GAAC CAGAC CAAT AAG GACAAAAT GGAA 732 

Qy 221 CysGluGluGlnLysValSerAspGluLeuMetSerGluSerAspSerSerSerLeuSer 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I t I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 7 33 T GT GAAGAGCAAAAAGT CT CAGAT GAGCT CAT GAGT GAAAGT GAT T C CAGCAGT CT C AGC 792 

Qy 241 SerThrAspAlaGlyLeuPheThrAsnAspGluGlyArgGlnGlyAspAspGluGlnSer 260 

I I I I I I I I I I II I I I I I I I I I I t I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 7 93 AG C AC T GAT G C T G GAT T G T T T AC CAAT GAT GAG G GAAG AC AAG G T GAT GAT GAAC AGAGT 852 

Qy 261 AspTrpPheTyrGluLysGluSerGlyGlyAlaCysGlylleThrGlyValValProTrp 280 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 1 II I I I I I I I I 
Db 853 GAC T G GT T C T AC GAAAAG GAAT C AG G T G GAG CAT GT G GT AT C AC T G GAG TTGTGCCCTGG 912 

Qy 281 TrpGluLysGluAspProThrGluLeuAspLysAsnValProAspProValPheGluSer 300 

I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 
Db 913 T G GGAAAAGGAAGAT C CT ACT GAGCT AGAC AAAAAT GT AC CAGAT C CT GT CT TT GAAAGT 972 



Qy 



Db 



301 IleLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGlnAlaArg 32 0 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 
973 ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCCAAGCTAGA 1032 



Qy 321 LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 340 

I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1033 CT CAGT C GC C T T CAT G GAAT GT CT T CAAAGAAT AT T AAAAAAT CT G GAGG GAC T C CAACT 1092 



Qy 



Db 



341 SerMetValProIleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 
I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I M I I I I I I I I I I II I I I I I I I I I I I I I I 
1093 TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 1152 



Qy 



Db 



361 SerHisHisHisAspHisTrpPheSerProGlyAlaArgThrGluHisAspGlnHisGln 38 0 
I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I II I I I I M I I I I II I I I I I I I I I I I 
1153 T C T CAT C AC CAT GAC CAT T G GT T TAG CCCTGGGGC TAG GAC AG AG CAT GAC C AG CAT C AG 1212 



Qy 381 LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerValArgThrAla 4 00 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I 
Db 1213 CT T CT GAGAGAT AAT C GAGC T GAAAGAG GACACAAGAAAAAT TGTTCTGT GAGAACAG C C 1272 

Qy 401 SerArgGlnThrSerMetHisLeuGlySerLeuCysThrGlyAspIleLysArgArgArg 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I I I I I I I I I I I I I I 
Db 1273 AG C AG G C AAAC AAG CAT G CAT T T AG GAT C C T TAT G C AC G G G AGAT AT C AAAC GG AGAAGA 1332 



Qy 

Db 

Qy 

Db 



421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 44 0 

I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
1333 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 1392 



441 



460 



IleLeuGluAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I ! I I I II I I I I i 1 I I II I I I I 
1393 AT C C T AGAAAAT AAT AT T G GAAAC C GAAT G CT T C AGAAT AT G G G CT G GAC GCCTGGGT CA 1452 



Qy 4 61 GlyLeuGlyArgAspGlyLysGlylleSerGluProIleGlnAlaMetGlnArgProLys 480 

I I I I I I I I I I 1 I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1453 GGCCTTGGACGAGATGGCAAGGGGATCTCTGAGCCAATTCAAGCCATGCAGAGGCCAAAG 1512 



Qy 



Db 



481 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 5 00 
I I I I I I I I I I I I I I I I I I I I I I I I I M I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
1513 GGAT TAG GAC T T G GAT T T C CT C T AC C AAAAAGT ACT T C C G CAACT ACT AC C C C C AAT G CA 1572 



Qy 501 GlyLysSerAla 504 

I I I I II I i I I I I 
Db 1573 GGAAAATCCGCC 1584 



RESULT 2 
DQ052882 
LOCUS 

DEFINITION 

ACCESSION 
VERSION 
KEYWORDS 
SOURCE 

ORGANISM 



DQ052882 1587 bp DNA linear GSS 02-JUN-2005 

Pan troglodytes FLJ10252 gene, VIRTUAL TRANSCRIPT, partial 
sequence, genomic survey sequence. 
DQ052882 

DQ052882.1 GI:66898829 
GSS. 

Pan troglodytes (chimpanzee) 
Pan troglodytes 

Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleos tomi ; 
Mammalia; Eutheria; Euarchontoglires ; Primates; Catarrhini; 



REFERENCE 
AUTHORS 



TITLE 

JOURNAL 
PUBMED 
REFERENCE 
AUTHORS 



TITLE 
JOURNAL 

COMMENT 



FEATURES 

source 



gene 



Hominidae; Pan. 

1 (bases 1 to 1587) 

Nielsen, R., Bustamante, C . , Clark, A. G., Glanowski , S . , Sackton, T . B . , 

Hubisz,M. J. , Fledel-Alon, A. , Tanenbaum, D . M. , Civello, D. , 

White, T. J., Sninsky, J. J. , Adams, M.D. and Cargill,M. 

A Scan for Positively Selected Genes in the Genomes of Humans and 

Chimpanzees 

(er) PLoS Biol. 3 (6), E170 (2005) 
15869325 

2 (bases 1 to 1587) 

Nielsen, R. , Bustamante, C . , Clark, A. G., Glanowski , S . , Sackton, T . B . , 
Hubisz,M. J. , Fledel-Alon, A. , Tanenbaum, D.M. , Civello, D. , 
White, T. J., Sninsky, J. J. , Adams, M.D. and Cargill,M. 
Direct Submission 

Submitted ( 05-MAY-2005 ) Celera Genomics, 45 West Gude Drive, 
Rockville, MD 20850, USA 

This sequence was made by sequencing genomic exons and ordering 
them based on alignment. Translation starts at the beginning of 
alignment . 

Location/Qualifiers 

1. .1587 

/organism= n Pan troglodytes' 1 
/mol_type =,, genomic DNA" 
/db_xref= u taxon: 9598" 
<1. .>1587 
/gene= n FLJ10252 n 



/l 


ocus tag= ,f 


HC13411 n 




ORIGIN 








Alignment Scores: 








Pred. No. : 


8. 94e-204 


Length : 


1587 


Score : 


2340. 00 


Matches : 


445 


Percent Similarity: 


88.29% 


Conservative : 


0 


Best Local Similarity: 


88.29% 


Mismatches : 


59 


Query Match: 


86. 86% 


Indels : 


0 


DB: 


11 


Gaps : 


0 



US- 


09-771-312 


Qy 


1 


Db 


73 


Qy 


21 


Db 


133 


Qy 


41 


Db 


193 


Qy 


61 


Db 


253 


Qy 


81 



-2 (1-504) x DQ052882 (1-1587) 

MetGluGluLeuValHisAspLeuValSerAlaLeuGluGluSerSerGluGlnAlaArg 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
AT GGAG GAG C T G GT T CAT GAC CTT GT CT CAG CAT T AGAAGAGAG CT CAGAGCAAGCT C GA 

GlyGlyPheAlaGluThrGlyAspHisSerArgSerlleSerCysProLeuLysArgGln 
I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I II I I I I I I I I I I II I I I I ! 
GGT G GAT T T G C T GAAAC AG GAGAC C ATT CT C GAAGT AT AT CT T G C C CT C T GAAAC G C CAG 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
GCAAG GAAAAG GAGAG G GAGAAAAC G GAG GT C GTATAAT GT GCAT CAC C C GT G G GAGACT 



20 



132 



40 



192 



60 



252 



GlyHisCysLeuSerGluGlySerAspSerSerLeuGluGluProSerLysAspTyrArg 8 0 
III I I I I I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I I I M I I II I II I 

GGTNNNTGCTTAAGTGAAGGCTCTGATTNNNGTTTAGAAGAACCNAGCAAGGACTATAGA 312 



GluAsnHisAsnAsnAsnLysLysAspHisSerAspSerAspAspGlnMetLeuValAla 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



100 



Db 



313 GAGAATNNNNNTAATAATAAAAAAGATNNNAGTGACTCTGATGACCAAATGTTNNNNNCA 372 



Qy 101 LysArgArgProSerSerAsnLeuAsnAsnAsnValArgGlyLysArgProLeuTrpHis 120 

I I I I I I I I I I I I I I I I I I I I I I 11 I I I I I I I I II I I I I I I I I I I I 

Db 373 AAG C G C AG GNNAT CAT CAAACT TAAAT AATAAT GT T C GANNGAANN NN CCT C TAT G GCAN 432 

Qy 121 GluSerAspPheAlaValAspAsnValGlyAsnArgThrLeuArgArgArgArgLysVal 140 

I I I I I I I I I III I I I I I I 
Db 433 GNNNNTGATTTTGCTGNNGACNNTGTTGGGNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 4 92 

Qy 141 LysArgMetAlaValAspLeuProGlnAspIleSerAsnLysArgThrMetThrGlnPro 160 

Db 4 93 NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN 552 

Qy 161 ProGluGlyCysArgAspGlnAspMetAspSerAspArgAlaTyrGlnTyrGlnGluPhe 180 

I I I I I I I I I I I I I I I M I I I I I I I I I I I M I I I I I I I I II I I I I I I I M I I 
Db 553 NNNNNNNNTT GTAGAGAT CAGGACAT GGACAGT GATAGAGCCTACCAGTATCAAGAATTT 612 

Qy 181 ThrLysAsnLysValLysLysArgLysLeuLysIlelleArgGlnGlyProLysIleGln 200 

II I I I I I M I I I I II I I I I I I I I I I I I I I I I I I I I I II I I M I I I I I I I I I I I I I I I I I I 

Db 613 AC CAAGAACAAAGT CAAAAAAAGAAAGT T GAAAATAAT C AGACAAG GAC CAAAAAT C CAA 672 

Qy 201 AspGluGlyValValLeuGluSerGluGluThrAsnGlnThrAsnLysAspLysMetGlu 220 

I t I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II M I I I I I I I I I II I I I 
Db 67 3 GAT GAAG G AG T AGT T T T AGAAAGT GAG GAAAC GAAC C AG AC C AAT AAG GAC AAAAT G GAA 732 

Qy 221 CysGluGluGlnLysValSerAspGluLeuMetSerGluSerAspSerSerSerLeuSer 240 

I I I I I II I II I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 733 T GT GAAGAGCAAAAAGT CT CAGAT GAGCT CAT GAGT GAAAGT GATT C CAGCAGT CT CAGC 7 92 

Qy 241 SerThrAspAlaGlyLeuPheThrAsnAspGluGlyArgGlnGlyAspAspGluGlnSer 2 60 

I I I II I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I II I I I I I I 
Db 793 AGCACT GAT GCT GGAT T GTTTACCAAT GAT GAGGGAAGACAAGGT GAT GAT GAACAGAGT 8 52 

Qy 261 AspTrpPheTyrGluLysGluSerGlyGlyAlaCysGlylleThrGlyValValProTrp 280 

I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I II I I II I I I I I I I I I I I I I I 
Db 8 53 GAC T G GT T C T AC GAAAAG GAAT CAGGT GGAGCAT GT GGT AT CACT G GAGT T GT G C C CT G G 912 

Qy 281 TrpGluLysGluAspProThrGluLeuAspLysAsnValProAspProValPheGluSer 300 

I I I I 1 I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I I I I II I I I I I I I I I I 
Db 913 T G G GAAAAG GAAGAT C CT ACT GAG CT AGAC AAAAAT GT AC CAGAT CCT GT CT T T GAAAGT 972 

Qy 301 IleLeuThrGlySerPheProLeuMetSerHisProSerArgArgGlyPheGlnAlaArg 320 

I I I I 1 I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I II I I I I I II i I I I II I I M I I I I I I I I 
Db 973 ATCTTAACTGGTTCTTTTCCCCTTATGTCACACCCAAGCAGAAGAGGTTTCC7\AGCTAGA 1032 

Qy 321 LeuSerArgLeuHisGlyMetSerSerLysAsnlleLysLysSerGlyGlyThrProThr 340 

I I I I I I I I I I II I I I I I I I I I I I II II I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1033 C T CAGT C G C CT T CAT G GAAT GT CT T CAAAGAAT AT TAAAAAAT CT G GAGG GACT C CAACT 1092 

Qy 341 SerMetValProIleProGlyProValGlyAsnLysArgMetValHisPheSerProAsp 360 

I I I I I I II I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I 
Db 1093 TCAATGGTACCCATTCCTGGCCCAGTGGGTAACAAGAGAATGGTTCATTTTTCCCCGGAT 1152 

Qy 361 SerHisHisHisAspHisTrpPheSerProGlyAlaArgThrGluHisAspGlnHisGln 380 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1153 T C T CAT C AC CAT GAC CAT TGGTTTAGCCCTGGGG C T AG GAC AGAG C AT GAC C AG CAT C AG 1212 



Qy 381 LeuLeuArgAspAsnArgAlaGluArgGlyHisLysLysAsnCysSerValArgThrAla 400 

I I I I I I I ! I I II I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I 
Db 1213 CTNCTGAGAGATAATCGAGCTGAAAGAGGACACAAGAAAAATTGTTNNNNNNNNNNNNNN 1272 

Qy 4 01 SerArgGlnThrSerMetHisLeuGlySerLeuCysThrGlyAspIleLysArgArgArg 420 

I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I 
Db 1273 NNNNNGCT^AACAAGCATGCATTTAGGATCCTTATGCACGGGAGATATCAAACGGAGAAGA 1332 

Qy 421 LysAlaAlaProLeuProGlyProThrThrAlaGlyPheValGlyGluAsnAlaGlnPro 440 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 1333 AAAGCTGCACCTTTGCCTGGACCTACTACTGCAGGATTTGTAGGTGAAAATGCCCAGCCA 1392 

Qy 441 IleLeuGluAsnAsnlleGlyAsnArgMetLeuGlnAsnMetGlyTrpThrProGlySer 4 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I I I I I I I I I I I I I I I 
Db 1393 AT C CT AGAAAATAAT ATT G GAAAC C GAAT G CT T CAGAAT AT G G G CT G GAC G C C T GG GT CA 1452 

Qy 461 GlyLeuGlyArgAspGlyLysGlylleSerGluProIleGlnAlaMetGlnArgProLys 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1453 G G C CT T G GAC GAGAT G G CAAG G GGAT CT CT GAG C CAAT T CAAGC CAT G CAGAGGC CAAAG 1512 

Qy 481 GlyLeuGlyLeuGlyPheProLeuProLysSerThrSerAlaThrThrThrProAsnAla 500 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 1513 GGATTAGGACTTGGATTTCCTCTACCAAAAAGTACTTCCGCAACTGCTACCCCCAATGCA 1572 

Qy 501 GlyLysSerAla 504 

I I I I I I I I I I I I 
Db 1573 GGAAAATCCGCC 1584 
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OM protein - protein search, using sw model 

Run on: December 4, 2005, 10:07:25 ; Search time 230 Seconds 

(without alignments) 
1546.027 Million cell updates/sec 

Title: US-09-771-312-2 
Perfect score: 2694 

Sequence: 1 MEELVHDLVSALEESSEQAR GFPLPKSTSATTTPNAGKSA 504 

Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 2166443 seqs, 705528306 residues 

Total number of hits satisfying chosen parameters: 2166443 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Uni Prot_05 . 80 : * 

1: uniprot_sprot:* 
2 : uni prot_trembl : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


2694 


100. 


0 


528 


1 


GPTC2_HUMAN 


Q9nw75 


homo sapien 


2 


2694 


100. 


0 


528 


2 


Q5VYK7_HUMAN 


Q5vyk7 


homo sapien 


3 


2310 


85. 


7 


527 


1 


GPTC2_MOUSE 


Q7tqc7 


mus musculu 


4 


2009.5 


74. 


6 


504 


2 


Q5F3Y2_CHICK 


Q5f3y2 


gallus gall 


5 


1813 


67. 


3 


376 


2 


Q 5 VYK 8_H U MAN 


Q5vyk8 


homo sapien 


6 


1587.5 


58. 


9 


414 


2 


Q4V7S5_XENLA 


Q4v7s5 


xenopus lae 


7 


1538 


57. 


1 


375 


2 


Q9D3E7_MOUSE 


Q9d3e7 


mus musculu 


8 


1513.5 


56. 


2 


410 


2 


Q6AYl5_RAT 


Q6ayl5 


rattus norv 


9 


1283.5 


47. 


6 


561 


2 


Q4RRB2_TETNG 


Q4rrb2 


tetraodon n 


10 


1008 


37. 


4 


216 


2 


Q6PIX0_HUMAN 


Q6pi xO 


homo sapien 


11 


939.5 


34. 


9 


408 


2 


Q5RJ37_BRARE 


Q5rj37 


brachydanio 


12 


627.5 


23. 


3 


482 


1 


CN118_M0USE 


Q6pe65 


mus musculu 


13 


593.5 


22. 


0 


482 


2 


Q9H3M3_HUMAN 


Q9h3m3 


homo sapien 


14 


545.5 


20. 


2 


453 


1 


CN118_HUMAN 


Q9nwq4 


homo sapien 


15 


438 


16. 


3 


467 


2 


Q4RLV5_TETNG 


Q4rlv5 


tetraodon n 


16 


430 


16. 


0 


107 


2 


Q9CSX3_MOUSE 


Q9csx3 


mus musculu 


17 


320.5 


11. 


9 


221 


2 


Q9ULA8_HUMAN 


Q9ula8 


homo sapien 


18 


197 


7. 


,3 


928 


2 


Q6H4V9_ORYSA 


Q6h4v9 


oryza sativ 


19 


177.5 


6. 


6 


1007 


2 


Q8VYR8_ARATH 


Q8vyr8 


arabidopsis 


20 


177.5 


6. 


6 


1105 


2 


Q9M383_ARATH 


Q9m383 


arabidopsis 


21 


170 


6. 


3 


812 


2 


Q6C233_YARLI 


Q6c233 


yarrowia li 
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22 


169 


6. 


.3 


742 


2 


Q6Z2C8_ORYSA 


Q6z2c8 


oryza sativ 


23 


167.5 


6. 


.2 


767 


1 


YNW4_YEAST 


P53866 


saccharomyc 


24 


166.5 


6. 


.2 


1469 


2 


Q5KKE0_CRYNE 


Q5kke0 


cryptococcu 


25 


165 


6, 


.1 


732 


2 


Q5KCU3_CRYNE 


Q5kcu3 


cryptococcu 


26 


165 


6, 


.1 


732 


2 


Q55lV9_CRYNE 


Q55iv9 


cryptococcu 


27 


164.5 


6, 


.1 


781 


2 


Q9SF87_ARATH 


Q9sf87 


arabidopsis 


28 


163.5 


6, 


.1 


346 


2 


Q5EB71_RAT 


Q5eb71 


rattus norv 


29 


162.5 


6, 


,0 


505 


2 


Q59HE6_HUMAN 


Q59he6 


homo sapi en 


30 


162.5 


6, 


,0 


815 


1 


RBM5_HUMAN 


P52756 


homo sapi en 


31 


162.5 


6, 


.0 


1067 


1 


SFR14_M0USE 


Q8ch09 


mus musculu 


32 


162 


6, 


.0 


520 


2 


Q99KV9_MOUSE 


Q99kv9 


mus musculu 


33 


162 


6, 


.0 


815 


2 


Q91YE7_M0USE 


Q91ye7 


mus musculu 


34 


160.5 


6, 


,0 


749 


2 


Q6DDU9_XENLA 


Q6ddu9 


xenopus lae 


35 


158.5 


5, 


,9 


1469 


2 


Q55VU9_CRYNE 


Q55vu9 


cryptococcu 


36 


157.5 


5, 


.8 


808 


2 


Q6BYP9_DEBHA 


Q6byp9 


debaryomyce 


37 


157.5 


5, 


,8 


852 


1 


RBM10_RAT 


P70501 


rattus norv 


38 


157.5 


5, 


,8 


853 


2 


Q8BTP8_MOUSE 


Q8btp8 


mus musculu 


39 


157.5 


5, 


,8 


857 


2 


Q80U75_MOUSE 


Q80u75 


mus musculu 


40 


157.5 


5, 


.8 


930 


2 


Q99KG3_MOUSE 


Q99kg3 
Q9ntbl 


mus musculu 


41 


156 


5, 


.8 


542 


2 


Q9NTB1_HUMAN 


homo sapi en 


42 


156 


5, 


.8 


705 


2 


Q59UG4_CANAL 


Q59ug4 


Candida alb 


43 


156 


5. 


.8 


852 


2 


Q9BTX0_HUMAN 


Q9btx0 


homo sapi en 


44 


156 


5, 


.8 


853 


2 


Q5 JRR2_HUMAN 


Q5jrr2 


homo sapi en 


45 


156 


5, 


.8 


929 


1 


RBM10_HUMAN 


P98175 


homo sapi en 



ALIGNMENTS 



RESULT 1 
GPTC2_HUMAN 

ID GPTC2_HUMAN STANDARD; PRT; 528 AA. 

AC Q9NW75; Q86YE7; 

DT 05-3UL-2004 (Rel. 44, Created) 

DT 05-JUL-2004 (Rel. 44, Last sequence update) 

DT 10-MAY-2005 (Rel. 47, Last annotation update) 

DE G patch domain containing protein 2. 

GN Name=GPATC2 ; 

OS Homo sapiens (Human). 

oc Eukaryota; Metazoa; chordata; Craniata; Vertebrata; Eutel eostomi ; 

oc Mammalia; Eutheria; Euarchontogl i res ; Primates; Catarrhini; Hominidae; 

oc Homo. 

ox NCBI_TaxlD=9606; 

RN [1] 

RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1). 

RC TISSUE=Embryo; 

RX PubMed=14702039; DOI=10 . 1038/ngl285 ; 

RA Ota T. , Suzuki Y. , Nishikawa T. , otsuki T. , Sugiyama T. , Irie R. , 

RA Wakamatsu A., Hayashi K. , Sato H., Nagai K., Kimura K. f Makita H. , 

RA Sekine M. , Obayashi M. , Nishi T. , shibahara T., Tanaka T. , Ishii S., 

RA Yamamoto 3. -I., Saito K. , Kawai Y. , Isono Y. , Nakamura Y. , 

RA Nagahari K. , Murakami K. , Yasuda T. , Iwayanagi T. , Wagatsuma M. , 

RA shiratori A., Sudo H., Hosoiri T. , Kaku Y. , Kodai ra H., Kondo H., 

RA Sugawara M. , Takahashi M. , Kanda K. , Yokoi T. , Furuya T. , Kikkawa E., 

RA Omura Y., Abe K. , Kamihara K. , Katsuta N . , Sato K. f Tanikawa M. , 

RA Yamazaki M. , Ninomiya K. , Ishibashi T. , Yamashita H . , Murakawa K. , 

RA Fujimori K., Tanai H., Kimata M. , Watanabe M. , Hi raoka s., chiba Y. , 

RA Isnida S., ono Y. , Takiguchi s. f Watanabe s., Yosida M. , Hotuta T. , 

RA Kusano 3., Kanehori K. , Takahashi -Fuji i A., Hara H. , Tanase T.-o., 

RA Nomura Y. f Togiya s., Komai F. , Hara R. , Takeuchi K. , Arita M. , 

RA imose N., Musasnino K. , Yuuki H., Oshima A., Sasaki N. , Aotsuka s. f 

RA Yoshikawa Y. , Matsunawa H., ichihara T. , shiohata N., Sano s. t 

RA Moriya s. ( Momiyama H., Satoh N. ( Takami s., Terashima Y. , Suzuki 0., 
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RA Nakagawa S., Senoh A., Mizoguchi H. f Goto Y. , shimizu F., Wakebe H. , 

RA Hishigaki H., Watanabe T. , Sugiyama A., Takemoto M. , Kawakami B., 

RA Yamazaki M. , Watanabe K. , Kumagai A., Itakura S., Fukuzumi Y. , 

RA Fujimori Y. , Komiyama M. , Tashiro H., Tanigami A., Fujiwara T. , 

RA Ono T. , Yamada K. , Fujii Y. , Ozaki K., Hi rao M. , ohmori Y., 

RA Kawabata A., Hikiji T. , Kobatake N., Inaqaki H., Ikema Y. , Okamoto S., 

RA okitani R. , Kawakami T. , Noguchi S. f Iton T. , Shigeta K., Senba T. , 

RA Matsumura K . , Nakajima Y. , Mizuno T. , Morinaga M. , Sasaki M. , 

RA Togashi T. , Oyama M. , Hata H. , Watanabe M. , Komatsu T., 

RA Mi zushima-Sugano J . , Satoh T. , Shirai Y. , Takahashi Y., Nakagawa K. , 

RA okumura K. , Nagase T. , Nomura N., Kikuchi H., Masuho Y. , Yamashita R., 

RA Nakai K. , Yada T. , Nakamura Y. , ohara 0., Isogai T. , Sugano s.; 

RT "Complete sequencing and characterization of 21,243 full-length human 



RT cDNAs . " ; 

RL Nat. Genet. 36:40-45(2004). 

RN [2] 

RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 2). 

RC TISSUE=Lung, and uterus; 

RX MEDLINE=22388257; PubMed=12477932 ; DOI=10 . 1073/pnas . 242603899 ; 

RA strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L. f shenmen CM. , Schuler G.D., 

RA Altschul S.F., Zeeberg B. , Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H., Moore T. , Max S.I., Wang 3 . , Hsieh F., 



RA Diatchenko L., Marusina K., Farmer A. A., Rubin G.M., Hong L., 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E., 

RA Brownstein M.J., Usdin T.B., Toshiyuki s., Carninci P., Prange c. , 

RA Raha S.S., Loquellano N.A., Peters G.J., Abramson R.D., Mullahy S.J., 

RA Bosak S.A., McEwan P.J., McKernan K.3., Malek J. A., Gunaratne P.H., 

RA Richards S., Worley K.C., Hale s. , Garcia A.M., Gay L.J., Hulyk S.W., 

RA villalon D.K., Muzny D.M. , Sodergren E.J., Lu X., Gibbs R.A., 

RA Fahey 3 . , Helton E. , Ketteman M. , Madan A., Rodrigues S., Sanchez A., 

RA whiting M. , Madan A., Young A.C, shevchenko Y. , Bouffard G.G., 

RA Blakesley R.W. , Touchman J.W., Green E.D., Dickson M.C., 

RA Rodriguez A.C, Grimwood J., Schmutz 3 . , Myers R.M., 

RA Butterfield Y.S.N. , Krzywinski M.I., skalska U. , smailus D.E., 

RA Schnerch A., Schein 3.E., Jones S.J.M., Marra M.A. ; 

RT "Generation and initial analysis of more than 15,000 full-length human 

RT and mouse cDNA sequences."; 

RL Proc. Natl. Acad. Sci . U.S.A. 99:16899-16903(2002). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Al ternative splicing; Named isoforms=2; 

CC Name-1; 

CC lsold=Q9NW75-l; Sequence=Di spl ayed ; 

CC Name=2; 

CC Isold=Q9NW75-2; Sequence=VSP_010527 , VSP_010528; 

CC Note=No experimental confirmation available; 

CC -!- SIMILARITY: Contains 1 G-patch domain. 

cc 

CC This Swiss-Prot entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics institute. There are no restrictions on its 

cc use as long as its content is in no way modified and this statement is not 

cc removed. 

cc 

DR EMBL; AK001114; BAA91509.1; -; mRNA. 

DR EMBL; BC042193; AAH42193.1; -; mRNA. 

DR EMBL; BC063474; AAH63474.1; -; mRNA. 

DR Ensembl; ENSG00000092978; Homo sapiens. 

DR HGNC ; HGNC25499; GPATC2. 

DR InterPro; IPR000467; G_patch. 

DR Pfam; PF01585; G-patch; 1. 

DR SMART; SM00443; G_patch; 1. 

DR PROSITE; PS50174; G_ PATCH ; 1. 
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KW Alternative splicing. 

FT DOMAIN 467 513 G-patch. 

FT VARSPLIC 367 376 VPIPGPVGNK -> ATNWTSEIPL (in isoform 2). 

FT / FTI d=V5 P_0 10 5 2 7 . 

FT VARSPLIC 377 528 Missing (in isoform 2). 

FT /FTld=VSP_010528 . 

FT CONFLICT 220 220 G -> A (in Ref. 2; AAH63474) . 

FT CONFLICT 225 225 D -> N (in Ref. 2; AAH42193) . 

SQ SEQUENCE 528 AA; 58944 MW; 472143144700DC26 CRC64; 



Query Match 100.0%; Score 2694; DB 1; Length 528; 

Best Local Similarity 100.0%; Pred. No. 1.5e-153; 

Matches 504 ; Conservati ve 0 ; Mi smatches 0 ; Indel s 0 ; Gaps 0 ; 



Qy 


1 


MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 


60 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 t 1 1 1 1 1 1 t 1 1 1 1 1 1 1 1 1 1 1 1 1 I 1 1 1 1 1 1 1 1 1 
1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 | 1 | | | | | | 




Db 


25 


MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVMHPWET 


84 


Qy 


61 


GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 


120 




i i i i i j i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i t i i i i i i i i i i 
1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


85 


GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 


144 


Qy 


121 


ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 


180 




i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i i t i i i i i i i i i i i i t i i i i i 

I M 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 




Db 


145 


ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 


204 


QV 


181 


TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 


240 




1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 II 1 M 1 1 1 1 II 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 1 1 1 1 1 II ! 1 1 1 M 1 1 




Db 


205 


TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSS5L5 


264 


Qy 


241 


STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 


300 




1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 




Db 


265 


STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 


324 


Qy 


301 


ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 


360 




1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 1 1 1 1 1 1 1 




Db 


325 


ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 


384 


Qy 


361 


SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 


420 




1 1 1 1 1 1 1 1 1 1 1 i [ 1 1 1 1 1 1 1 1 ! 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 L 1 1 1 1 1 1 1 1 1 i 1 1 1 




Db 


385 


SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 


444 


Qy 


421 


KAAPLPGPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPK 


480 




IMIIIIMIIIIMI MM II II INI IIIIIIIMIII MM llllllllllll MM 




Db 


445 


KAAPLPGPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPK 


504 


Qy 


481 


GLGLGFPLPKSTSATTTPNAGKSA 504 






MMMM llllllll MM II II 




Db 


505 


GLGLGFPLPKSTSATTTPNAGKSA 528 





RESULT 2 
Q5VYK7_HUMAN 



ID Q5VYK7_HUMAN PRELIMINARY; PRT; 528 AA. 

AC Q5VYK7; 

DT Ol-FEB-2005 (TrEMBLrel . 29, Created) 

DT Ol-FEB-2005 (TrEMBLrel. 29, Last sequence update) 

DT Ol-FEB-2005 (TrEMBLrel. 29, Last annotation update) 

DE Novel protein. 

GN Name=RPll-36lKl7. 1; ORFNames=RPll-36lKl7. 1-001; 

OS Homo sapiens (Human). 

oc Eukaryota; Metazoa; chordata; Craniata; Vertebrata; Eutel eostomi ; 

oc Mammalia; Eutheria; Euarchontogli res ; Primates; Catarrhini; Hominidae; 
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oc Homo. 

ox NCBl_TaxlD=9606 ; 

RN [1] 

RP NUCLEOTIDE SEQUENCE. 

RA Griffiths C. ; 

RL Submitted (MAY-2005) to the EMBL/GenBank/DDBJ databases. 

DR EMBL; AL354659; CAH70664.1; -; Genomi c_DNA. 

DR EMBL ; AC096641; CAH70664.1; JOINED; Genomic_DNA. 

DR GO; GO:0005622; C:intracellular; IEA. 

DR GO; GO:0003676; F:nucleic acid binding; IEA. 

SQ SEQUENCE 528 AA; 58943 MW; 472143144700DC26 CRC64; 

Query Match 100.0%; Score 2694; DB 2; Length 528; 

Best Local Similarity 100.0%; Pred. No. 1.5e-153; 

Matches 504; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 60 

M I 1 1 I I I I 1 1 1 1 1 1 I I I I I 1 1 1 1 1 1 1 1 1 1 I I 1 1 1 1 1 1 M 1 1 I I I I I I I 1 1 I I 1 1 1 1 I M 

MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 84 

GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 120 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I | | | I | | || | | | | | | | | | | M I I I 
GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 144 

ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 180 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I 
ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 204 

TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 240 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II II I I I I I I I I I I I I I I I I I I I I 
TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 264 

STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 300 

II M I I I I I I II II I I I II I I II II I II I I I I I II II II II II I I I I I I I I I II II II I 
STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 324 

ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 360 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 384 

SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M ! I I I I I I I I I I 

SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 444 
KAAPLPGPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPK 480 

! I I I I I M I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I l I I I I I I I I I I I I I 

KAAPLPGPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPK 504 

GLGLGFPLPKSTSATTTPNAGKSA 504 

I I I I I I I I I I I I I I I I I I I I I I I 
GLGLGFPLPKSTSATTTPNAGKSA 528 

RESULT 3 
GPTC2_MOUSE 

ID GPTC2__MOUSE STANDARD; PRT; 527 AA. 

AC Q7TQC7; Q8BNJ9; Q8BPM1; Q8CDH9; 

DT 05-JUL-2004 (Rel . 44, Created) 

DT 05-JUL-2004 (Rel. 44, Last sequence update) 

DT 13-SEP-2005 (Rel. 48, Last annotation update) 

DE G patch domain containing protein 2. 

GN Name=Gpatc2; 

OS Mus musculus (Mouse). 



Qy 


1 


Db 


25 


Qy 


61 


Db 


85 


Qy 


121 


Db 


145 


Qy 


181 


Db 


205 


Qy 


241 


Db 


265 


Qy 


301 


Db 


325 


Qy 


361 


Db 


385 


Qy 


421 


Db 


445 


Qy 


481 


Db 


505 
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oc Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

oc Mammalia; Eutheria; Euarchontogli res; Glires; Rodentia; Sciurognathi ; 

OC Muroidea; Muridae; Murinae; Mus. 

OX NCBI_TaxlD=10090 ; 

RN [1] 

RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORMS 2 AND 3). 

RC STRAIN=C57BL/6J ; TlSSUE=Eye, and Testis; 

RX MEDLINE=22354683; PubMed=12466851; DOI=10 . 1038/nature01266 ; 

RA okazaki Y. , Furuno M. , Kasukawa T. , Adachi J., Bono H., Kondo s., 

RA Nikaido I., osato N., saito R. , Suzuki H., Yamanaka I., Kiyosawa H., 

RA Yaqi K., Tomaru Y. , Hasegawa Y. , Nogami A. , Schonbach C. , Gojobori T. , 

RA Baldarelli R. , Hill D.P., Bult o, Hume D.A., Quackenbush J., 

RA Schriml L.M., Kanapin A., Matsuda H., Batalov s., Beisel K.W., 

RA Blake J. A., Bradt D. , Brusic V., chothia C, Corbani L.E., Cousins S., 

RA Dalla E., Dragani T.A. , Fletcher C.F., Forrest A., Frazer K.S., 

RA Gaasterland T. f Gariboldi M. , Gissi c. , Godzik A., Gough J . , 

RA Grimmond S., Gustincich S., Hi rokawa N., Jackson I. J., Jarvis E.D., 

RA Kanai A., Kawaji H. f Kawasawa Y. , Kedzierski R.M., King B.L., 

RA Konagaya A., Kurochkin I. v., Lee Y. , Lenhard B., Lyons P. A., 

RA Maglott D.R., Maltais L., Marchionni L., McKenzie L. , Miki H., 

RA Nagashima T. , Numata K. , okido T. , Pavan W.J., Pertea G., Pesole G., 

RA Petrovsky N. ( Pillai R., Pontius J.U., Qi D, , Ramachandran S., 

RA Ravasi T. , Reed 3.C., Reed D.J., Reid J., Ring B.Z., Ringwald M. ( 

RA Sandelin A., Schneider c, Semple C.A., Setou M. , Shimada K., 

RA Sultana R. , Takenaka Y. , Taylor M.S., Teasdale R.D., Tomita M . , 

RA Verardo R. , Wagner L., Wahlestedt C, Wang Y. , Watanabe Y. , Wells c. , 

RA Wilming L.G., Wynshaw-Boris A., Yanagisawa M. , Yang I., Yang L. , 

RA Yuan Z., Zavolan M. , zhu Y. , zimmer A., carninci P., Hayatsu N. , 

RA Hi rozane-Kishikawa T., Konno H., Nakamura M., Sakazume N., Sato K. , 

RA shiraki T. , Waki K. , Kawai 3 . , Aizawa K. , Arakawa T. , Fukuda S. , 

RA Hara A., Hashizume w. , imotani K. , Ishii Y. , Itoh M. , Kagawa I., 

RA Miyazaki A., Sakai K., Sasaki D. , shibata K. , Shinagawa A., 

RA Yasunishi A., Yoshino M. , Waterston R., Lander E.S., Rogers 3 . f 

RA Birney E. ( Hayashizaki Y. ; 

RT "Analysis of the mouse transcriptome based on functional annotation of 

RT 60,770 full-length cDNAs . " ; 

RL Nature 420:563-573(2002). 

RN [2] 

RP NUCLEOTIDE SEQUENCE [LARGE SCALE MRNA] (ISOFORM 1). 

rc strain=c57bl/6; TissuE=Brai n ; 

RX MEDLINE=22388257; PubMed-12477932 ; DOI=10.1073/pnas. 242603899; 

RA strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L., Shenmen CM. , Schuler G.D., 

RA Altschul S.F., zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H., Moore T. , Max S.I., Wang J., Hsieh F., 

RA Diatchenko L., Marusina K., Farmer A. A. , Rubin G.M., Hong L., 

RA stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E., 

RA Brownstein M.J., usdin T.B., Toshiyuki s., Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A., Peters G.J., Abramson R.D., Mullahy S.J., 

RA Bosak S.A., McEwan P.J., McKernan K.J., Malek J. A., Gunaratne P.H., 

RA Richards s. , Worley K.c, Hale s., Garcia A.M., Gay L.J., Hulyk s.w., 

RA villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A. , 

RA Fahey J., Helton E. f Ketteman M. , Madan A., Rodrigues s., Sanchez A., 

RA whiting M., Madan A., Young A.C., Shevchenko Y. , Bouffard G.G., 

RA Blakesley R.W. , Touchman J.W., Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J., Myers R.M., 

RA Butterfield Y.S.N. , Krzywinski M.I., Skalska U., Smailus D.E., 

RA Schnerch A., Schein J.E., Jones S.J.M., Marra M.A. ; 

RT "Generation and initial analysis of more than 15,000 full-length human 

RT and mouse cDNA sequences."; 

RL Proc. Natl. Acad. Sci . U.S.A. 99:16899-16903(2002). 

CC -!- ALTERNATIVE PRODUCTS: 

cc Event=Al ternative splicing; Named isoforms-3; 
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cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 

DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
SQ 



us-09-771-312-2.rup 

Name=l; 

lsoId=Q7TQC7-l; Sequence=Di splayed; 
Name=2 ; 

lsold=Q7TQC7-2 ; Sequence=VSP_010529 ; 
Note=No experimental confirmation available; 
Name=3; 

IsoId=Q7TQC7-3; Sequence=VSP_010530 ; 
Note=No experimental confirmation available; 
SIMILARITY: Contains 1 G-patch domain. 



This Swiss-Prot entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use as long as its content is in no way modified and this statement is not 
removed. 



EMBL ; AK030026; BAC26744.1; - 
EMBL; AK053781; BAC35520.1; - 
EMBL; AK083471; BAC38928.1; - 
EMBL; BC054810; AAH54810.1; - 
Ensembl; ENSMUSG00000039210 ; 
MGI; MGI: 1915019; Gpatc2. 
interPro; IPR000467; G_patch 
Pfam; PF01585; G-patch; 1. 
SMART; SM00443; G_patch; 1. 
PROSITE; PS50174; G-PATCH; 1 
Alternative splicing 



mRNA. 
mRNA. 
mRNA. 
mRNA. 
Mus musculus. 



DOMAIN 
VARSPLIC 

VARSPLIC 



CONFLICT 
CONFLICT 
SEQUENCE 



466 

1 

388 



251 
367 
527 AA; 



512 
23 

425 



251 
367 
58218 



MW; 



G-patch. 
Missing (in isoform 2). 
/FTld=VSP_010529. 

DHWFSPGARTEHGQHQLLRDNRAERGHKKSCSLKTASR -> 

E (in isoform 3). 

/FTld=VSP_010530. 

D -> Y (in Ref. 1; BAC26744). 

s -> P (in Ref. 2). 

4F4F29FA56BE06B7 CRC64; 



Query Match 85.7%; score 2310; DB 1; Length 527; 

Best Local similarity 84.9%; Pred. No. 1.7e-130; 

Matches 428; Conservative 35; Mismatches 41; Indels 0; 



Gaps 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 



1 MEELVHDLVSALEESSEQARGGFAETGDHSRSISCPLKRQARKRRGRKRRSYNVHHPWET 
' I I I I I I I I I I I I I I I I I I i I I I I I I: I I I:: I I I I I I I I I I I I I I I I I I I I I I II I I I 
24 MEELVHDLVSALEESSEQARGGFAETGEHSRNLSCPLKRQARKRRGRKRRSYNVHHPWET 



0; 

60 
83 
120 



61 GHCLSEGSDSSLEEPSKDYRENHNNNKKDHSDSDDQMLVAKRRPSSNLNNNVRGKRPLWH 

I I I [ I I I I I I I I If 1 I I I I . |:||||| II I i I I I I I I I I I I I I I I I I I Ml 

84 GHCLSEGSDSSLEEPSKDYREKHSNNKKDRSDSDDQMLVAKRRPSSNLSSSVRGKRLLWH 143 

121 ESDFAVDNVGNRTLRRRRKVKRMAVDLPQDISNKRTMTQPPEGCRDQDMDSDRAYQYQEF 180 

I II I I I II I I I II M . I I I I II I 11:1: I I II II I I I I I I I I I I : I I I II II 
144 ESDFAVDSLGNRTLRRRRKVKRMAVDLPQDVSSKRTMTQLPEGCRDQDMDNDRASQYPEF 203 

181 TKNKVKKRKLKIIRQGPKIQDEGWLESEETNQTNKDKMECEEQKVSDELMSESDSSSLS 240 

I: IIIIMII II III |:|| MINI :| I I I : I I I I I I I II I MIMIMI 

204 TRKKVKKRKLKGIRPGPKTQEEGGVLESEERSQPNKDRMEYEEQKASDELRSESDTSSLS 263 

241 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGITGWPWWEKEDPTELDKNVPDPVFES 300 

Ml I I II I I I II I I I I I I I : I I II I I I II II I I Illlllll::| Ml I : I I I I I I I 
264 STDAGLFTNDEGRQGDDEQSDWFYEKESGGACGIAGWPWWEKDEPAELDTNLPDPVFES 323 

301 ILTGSFPLMSHPSRRGFQARLSRLHGMSSKNIKKSGGTPTSMVPIPGPVGNKRMVHFSPD 360 

MMMIIMM I IMMMIMI lllllll I I II: III I 1 I I I I 1 I I I 
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Db 324 ILSGSFPLMSHPGRGGFQARLSRLHGTPSKNIKKSSGAPPSMLSAPGPGSNKRMVHFSPD 383 

Qy 361 SHHHDHWFSPGARTEHDQHQLLRDNRAERGHKKNCSVRTASRQTSMHLGSLCTGDIKRRR 420 

_ :l I I I I I I I I I I I I I I I I I I I I I I I I I I I I: I I I I I I I I I I I I I I I | | | M | 

Db 384 AHRHDHWFSPGARTEHGQHQLLRDNRAERGHKKSCSLKTASRQTSMHLGSLCTGDIKRRR 443 

Qy 421 KAAPLPGPTTAGFVGENAQPILENNIGNRMLQNMGWTPGSGLGRDGKGISEPIQAMQRPK 480 

MINIMI II M M I M M Ml M I IMM I I I I M I I M M M I MMM M M I I 

Db 444 KAAPLPGPTAAGIVGENAQPILESNIGNRMLQSMGWTPGSGLGRDGRGIAEPVQAVQRPK 503 

Qy 481 GLGLGFPLPKSTSATTTPNAGKSA 504 

I I I I I I I I II I : : I : I I 
Db 504 GLGLGFPLPKSSPTSPAPTSGNPA 527 
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